Re: spark.streaming.receiver.maxRate

2017-09-15 Thread Margus Roo
Some more info val lines = ssc.socketStream() // works val lines = ssc.receiverStream(new NiFiReceiver(conf, StorageLevel.MEMORY_AND_DISK_SER_2)) // does not work Margus (margusja) Roo http://margus.roo.ee skype: margusja https://www.facebook.com/allan.tuuring +372 51 48 780 On 15/09/2017

Re: spark.streaming.receiver.maxRate

2017-09-15 Thread Margus Roo
Hi I tested |spark.streaming.receiver.maxRate and ||spark.streaming.backpressure.enabled settings using socketStream and it works.| |But if I am using nifi-spark-receiver (https://mvnrepository.com/artifact/org.apache.nifi/nifi-spark-receiver) then it does not using |

[SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-15 Thread Arun Khetarpal
Hi - Wanted to understand if spark sql has GRANT and REVOKE statements available? Is anyone working on making that available? Regards, Arun - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RE: RDD order preservation through transformations

2017-09-15 Thread johan.grande.ext
Well, the dataframes make it easier to work on some columns of the data only and to store results in new columns, removing the need to zip it all back together and thus to preserve order. On 2017-09-05 14:04 CEST, mehmet.su...@gmail.com wrote: Hi Johan, DataFrames are building on top of

Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-15 Thread rpulluru
Hi, I am using sparkR randomForest function and running into java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue. Looks like I am running into this issue https://issues.apache.org/jira/browse/SPARK-1476, I used spark.default.parallelism=1000 but still facing the same issue.

Re: Nested RDD operation

2017-09-15 Thread Jean Georges Perrin
Hey Daniel, not sure this will help, but... I had a similar need where i wanted the content of a dataframe to become a "cell" or a row in the parent dataframe. I grouped by the child dataframe, then collect it as a list in the parent dataframe after a join operation. As I said, not sure it

Re: RDD order preservation through transformations

2017-09-15 Thread Suzen, Mehmet
Hi Johan, DataFrames are building on top of RDDs, not sure if the ordering issues are different there. Maybe you could create minimally large enough simulated data and example series of transformations as an example to experiment on. Best, -m Mehmet Süzen, MSc, PhD | PRIVILEGED

Nested RDD operation

2017-09-15 Thread Daniel O' Shaughnessy
Hi guys, I'm having trouble implementing this scenario: I have a column with a typical entry being : ['apple', 'orange', 'apple', 'pear', 'pear'] I need to use a StringIndexer to transform this to : [0, 2, 0, 1, 1] I'm attempting to do this but because of the nested operation on another RDD I

RE: RDD order preservation through transformations

2017-09-15 Thread johan.grande.ext
Thanks all for your answers. After reading the provided links I am still uncertain of the details of what I'd need to do to get my calculations right with RDDs. However I discovered DataFrames and Pipelines on the "ML" side of the libs and I think they'll be better suited to my needs. Best,