Off heap memory settings and Tungsten

2017-04-22 Thread geoHeil
Hi, I wonder when to enable spark's off heap settings. Shouldn't tungsten enable these automatically in 2.1? http://stackoverflow.com/questions/43330902/spark-off-heap-memory-config-and-tungsten Regards, Georg -- View this message in context:

Re: Case class with POJO - encoder issues

2017-03-10 Thread geoHeil
http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset describes the Problem. Actually, I have the same Problem. Is there a simple way to build such an Encoder which serializes into multiple fields? I would not want to replicate the Whole JTS geometry class hierarchy

graphframes stateful motif

2017-01-30 Thread geoHeil
Starting out with graph frames I would like to understand stateful motifs better. There is a nice example in the documentation. How can I explicitly return the counts? How could it be extended to count - the friends of each vertex with age > 30 - the percentage of friendsGreater30 / allFriends

Migrate spark sql to rdd for better performance

2017-01-03 Thread geoHeil
I optimized a spark sql script but have come to the conclusion that the sql api is not ideal as the tasks which are generated are slow and require too much shuffling. So the script should be converted to rdd http://stackoverflow.com/q/41445571/2587904 How can I formulate this more efficient

ThreadPoolExecutor - slow spark job

2016-12-23 Thread geoHeil
it up here http://stackoverflow.com/questions/41298550/spark-threadpoolexecutor-very-often-called-in-tasks as well with a minimal example of https://github.com/geoHeil/sparkContrastCoding Looking forward to any input to speed up this spark job. cheers, Georg -- View this message in context

Spark kryo serialization register Datatype[]

2016-12-21 Thread geoHeil
To force spark to use kryo serialization I set spark.kryo.registrationRequired to true. Now spark complains that: Class is not registered: org.apache.spark.sql.types.DataType[] is not registered. How can I fix this? So far I could not successfully register this class. -- View this message in

Dynamic spark sql

2016-12-12 Thread geoHeil
Hi I am curious how to dynamically generate spark sql in the scala api. http://stackoverflow.com/q/41102347/2587904 >From this list val columnsFactor = Seq("bar", "baz") I want to generate multiple withColumn statements dfWithNewLabels.withColumn("replace", lit(null: String))

Debugging persistence of custom estimator

2016-12-07 Thread geoHeil
Hi, I am writing my first own spark pipeline components with persistence and have troubles debugging them. https://github.com/geoHeil/sparkCustomEstimatorPersistenceProblem holds a minimal example where `sbt run` and `sbt test` result in "different" errors. When I tried to debug it in

code generation memory issue

2016-11-21 Thread geoHeil
I am facing a strange issue when trying to correct some errors in my raw data The problem is reported here: https://issues.apache.org/jira/browse/SPARK-18532 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/code-generation-memory-issue-tp28114.html Sent

Fill nan with last (good) value

2016-11-17 Thread geoHeil
How can I fill nan values with the last (good) value? For me, it would be enough to fill it with the previous value of a window function. So far I could it not get to work as my window function only returns nan values. Here is code for a minimal example: