Re: PLs assist: trying to FlatMap a DataSet / partially OT

2017-09-16 Thread Marco Mistroni
Not exactly...I was not going to flatmap the rdd In the end I amended my approach to the problem and managed to get the flatmap on the dataset Thx for answering Kr On Sep 16, 2017 4:53 PM, "Akhil Das" wrote: > scala> case class Fruit(price: Double, name: String) > defined

Re: Configuration for unit testing and sql.shuffle.partitions

2017-09-16 Thread Femi Anthony
How are you specifying it, as an option to spark-submit ? On Sat, Sep 16, 2017 at 12:26 PM, Akhil Das wrote: > spark.sql.shuffle.partitions is still used I believe. I can see it in the > code >

Re: Configuration for unit testing and sql.shuffle.partitions

2017-09-16 Thread Akhil Das
spark.sql.shuffle.partitions is still used I believe. I can see it in the code and in the documentation page

Re: PLs assist: trying to FlatMap a DataSet / partially OT

2017-09-16 Thread Akhil Das
scala> case class Fruit(price: Double, name: String) defined class Fruit scala> val ds = Seq(Fruit(10.0,"Apple")).toDS() ds: org.apache.spark.sql.Dataset[Fruit] = [price: double, name: string] scala> ds.rdd.flatMap(f => f.name.toList).collect res8: Array[Char] = Array(A, p, p, l, e) This is

Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-16 Thread Akhil Das
What are the parameters you passed to the classifier and what is the size of your train data? You are hitting that issue because one of the block size is over 2G, repartitioning the data will help. On Fri, Sep 15, 2017 at 7:55 PM, rpulluru wrote: > Hi, > > I am using

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Jörn Franke
It depends on the permissions the user has on the local file system or HDFS, so there is no need to have grant/revoke. > On 15. Sep 2017, at 17:13, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and REVOKE statements available? > Is

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-16 Thread Akhil Das
I guess no. I came across a test case where they are marked as Unsupported, you can see it here. However, the one running inside Databricks has support for this.

Re: spark.streaming.receiver.maxRate

2017-09-16 Thread Akhil Das
I believe that's a question to the NiFi list, as you can see the the code base is quite old https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver/src/main/java/org/apache/nifi/spark and it doesn't make use of the