Hi,
I'm having trouble building a recommender and would appreciate a few
pointers.
I have 350,000,000 events which are stored in roughly 500,000 S3 files and
are formatted as semi-structured JSON. These events are not all relevant to
making recommendations.
My code is (roughly):
case class
Hi,
My apologies for what has ended up as quite a long email with a lot of
open-ended questions, but, as you can see, I'm really struggling to get
started and would appreciate some guidance from people with more
experience. I'm new to Spark and big data in general, and I'm struggling
with what I
Hi,
I've got a bunch of data stored in S3 under directories like this:
s3n://blah/y=2015/m=01/d=25/lots-of-files.csv
In Hive, if I issue a query WHERE y=2015 AND m=01, I get the benefit that
it only scans the necessary directories for files to read.
As far as I can tell from searching and
Thanks Michael.
I'm not actually using Hive at the moment - in fact, I'm trying to avoid it
if I can. I'm just wondering whether Spark has anything similar I can
leverage?
Thanks
Ah, well that is interesting. I'll experiment further tomorrow. Thank you for
the info!
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org