date:20150927

Re: Scala Limitation - Case Class definition with more than 22 arguments

2015-09-27 Thread Dean Wampler

While case classes no longer have the 22-element limitation as of Scala 2.11, tuples are still limited to 22 elements. For various technical reasons, this limitation probably won't be removed any time soon. However, you can nest tuples, like case classes, in most contexts. So, the last bit of

Re: HDFS small file generation problem

2015-09-27 Thread ayan guha

I would suggest not to write small files to hdfs. rather you can hold them in memory, maybe off heap. and then you may flush it to hdfs using another job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark already has something like it) On Sun, Sep 27, 2015 at 11:36 PM,

HDFS small file generation problem

2015-09-27 Thread nibiau

Hello, I'm still investigating my small file generation problem generated by my Spark Streaming jobs. Indeed, my Spark Streaming jobs are receiving a lot of small events (avg 10kb), and I have to store them inside HDFS in order to treat them by PIG jobs on-demand. The problem is the fact that I

Re: Spark SQL: Native Support for LATERAL VIEW EXPLODE

2015-09-27 Thread Michael Armbrust

No, you would just have to do another select to pull out the fields you are interested in. On Sat, Sep 26, 2015 at 11:11 AM, Jerry Lam wrote: > Hi Michael, > > Thanks for the tip. With dataframe, is it possible to explode some > selected fields in each purchase_items? >

textFile() and includePackage() not found

2015-09-27 Thread Eugene Cao

Error: no methods for 'textFile' when I run the following 2nd command after SparkR initialized sc <- sparkR.init(appName = "RwordCount") lines <- textFile(sc, args[[1]]) But the following command works: lines2 <- SparkR:::textFile(sc, "C:\\SelfStudy\\SPARK\\sentences2.txt") In addition, it

RE: textFile() and includePackage() not found

2015-09-27 Thread Sun, Rui

Eugene, SparkR RDD API is private for now (https://issues.apache.org/jira/browse/SPARK-7230) You can use SparkR::: prefix to access those private functions. -Original Message- From: Eugene Cao [mailto:eugene...@163.com] Sent: Monday, September 28, 2015 8:02 AM To:

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-27 Thread Zhiliang Zhu

Hi All, Would some expert help me some about the issue... I shall appreciate you kind help very much! Thank you! Zhiliang On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu wrote: Hi Alexis, Gavin, Thanks very much for your kind comment.My

Re: HDFS small file generation problem

2015-09-27 Thread Deenar Toraskar

You could try a couple of things a) use Kafka for stream processing, store current incoming events and spark streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too (in a micro batched mode), so every x minutes. Kafka is more suited to processing lots of small events/ b)

FP-growth on stream data

2015-09-27 Thread masoom alam

Is it possible to run FP-growth on stream data in its current versionor a way around? I mean is it possible to use/augment the old tree with the new incoming data and find the new set of frequent patterns? Thanks

Re: Scala Limitation - Case Class definition with more than 22 arguments

Re: HDFS small file generation problem

HDFS small file generation problem

Re: Spark SQL: Native Support for LATERAL VIEW EXPLODE

textFile() and includePackage() not found

RE: textFile() and includePackage() not found

Re: How to properly set conf/spark-env.sh for spark to run on yarn

Re: HDFS small file generation problem

FP-growth on stream data

9 matches

Site Navigation

Mail list logo

Footer information