XGBoost on PySpark

2018-05-19 Thread Aakash Basu
Hi guys, I need help in implementing XG-Boost in PySpark. As per the conversation in a popular thread regarding XGB goes, it is available in Scala and Java versions but not Python. But, we've to implement a pythonic distributed solution (on Spark) maybe using DMLC or similar, to go ahead with

Re: OOM: Structured Streaming aggregation state not cleaned up propertly

2018-05-19 Thread weand
Nobody has any idea... ? Is filtering after aggregation in structured streaming supported but maybe buggy? See following line in the example from earlier mail... ... .where(F.expr("distinct_username >= 2")) ... -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Spark is not evenly distributing data

2018-05-19 Thread SparkUser6
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: OOM: Structured Streaming aggregation state not cleaned up properly

2018-05-19 Thread Ted Yu
Hi, w.r.t. ElementTrackingStore, since it is backed by KVStore, there should be other classes which occupy significant memory. Can you pastebin the top 10 entries among the heap dump ? Thanks

Spark UNEVENLY distributing data

2018-05-19 Thread Alchemist
I am trying to parallelize a simple Spark program processes HBASE data in parallel.// Get Hbase RDD JavaPairRDD hBaseRDD = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); long