How to recursively aggregate Treelike(hierarchical) data using Spark?

2018-09-25 Thread newroyker
The problem statement and an approach to solve it recursively is described here: https://stackoverflow.com/questions/52508872/how-to-recursively-aggregate-treelikehierarchical-data-using-spark Looking for more elegant/performant solutions, if they exist. TIA ! -- Sent from:

RE: Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/ Yinan, Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736 My spark-submit is as follows: ./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443 --conf spark.app.name=spark-py

[Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-25 Thread Gokula Krishnan D
Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key's are totally different. For the instance, scala> spark.sql("select

Re: [Spark SQL]: Java Spark Classes With Attributes of Type Set In Datasets

2018-09-25 Thread Dillon Dukek
Actually, it appears walking through it in a debug terminal that the deserializer can properly transform the data on read to an ArrayType, but the serializer doesn't know what to do when we try to go back out from the internal spark representation. tags, if

can Spark 2.4 work on JDK 11?

2018-09-25 Thread kant kodali
Hi All, can Spark 2.4 work on JDK 11? I feel like there are lot of features that are added in JDK 9, 10, 11 that can make deployment process a whole lot better and of course some more syntax sugar similar to Scala. Thanks!

[Spark SQL]: Java Spark Classes With Attributes of Type Set In Datasets

2018-09-25 Thread ddukek
I'm trying to use a data model that has a instance variable that is a Set. If I leave the type as the Abstract Set class I get an error thrown because Set is an interface so it cannot be instantiated. If I then try and make the variable a concrete implementation of Set I get an analysis exception

Re: Python kubernetes spark 2.4 branch

2018-09-25 Thread Yinan Li
Can you give more details on how you ran your app, did you build your own image, and which image are you using? On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) wrote: > Hi, > > I am trying to run spark python testcases on k8s based on tag > spark-2.4-rc1. When

Python kubernetes spark 2.4 branch

2018-09-25 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi, I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue? Regards Surya

can I model any arbitrary data structure as an RDD?

2018-09-25 Thread kant kodali
Hi All, I am wondering if I can model any arbitrary data structure as an RDD? For example, can I model, Red-black trees, Suffix Trees, Radix Trees, Splay Trees, Fibonacci heaps, Tries, Linked Lists etc as RDD's? If so, how? To implement a custom RDD I have to implement compute and getPartitions