unsubscribe

2019-06-24 Thread Dave Moyers

Re: Does Random Forest in spark ML supports multi label classification in scala

2017-11-07 Thread Dave Moyers
Yes, see https://dzone.com/articles/predictive-analytics-with-spark-ml Although the example uses two labels, the same approach supports multiple labels. Sent from my iPad > On Nov 7, 2017, at 6:30 AM, HARSH TAKKAR wrote: > > Hi > > Does Random Forest in spark Ml

Re: Spark Job Hanging on Join

2016-02-23 Thread Dave Moyers
ng fast afterwards :) > > On Feb 22, 2016 21:24, "Dave Moyers" <davemoy...@icloud.com> wrote: >> Good article! Thanks for sharing! >> >> >> > On Feb 22, 2016, at 11:10 AM, Davies Liu <dav...@databricks.com> wrote: >> > >&

Re: Spark Job Hanging on Join

2016-02-22 Thread Dave Moyers
Good article! Thanks for sharing! > On Feb 22, 2016, at 11:10 AM, Davies Liu wrote: > > This link may help: > https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html > > Spark 1.6 had improved the CatesianProduct, you should

Re: spark-xml can't recognize schema

2016-02-21 Thread Dave Moyers
Make sure the xml input file is well formed (check your end tags). Sent from my iPhone > On Feb 21, 2016, at 8:14 AM, Prathamesh Dharangutte > wrote: > > This is the code I am using for parsing xml file: > > > > import org.apache.spark.{SparkConf,SparkContext} >

Re: Spark Job Hanging on Join

2016-02-20 Thread Dave Moyers
Try this setting in your Spark defaults: spark.sql.autoBroadcastJoinThreshold=-1 I had a similar problem with joins hanging and that resolved it for me. You might be able to pass that value from the driver as a --conf option, but I have not tried that, and not sure if that will work. Sent

Best way to use Spark UDFs via Hive (Spark Thrift Server)

2015-10-22 Thread Dave Moyers
Hi, We have several udf's written in Scala that we use within jobs submitted into Spark. They work perfectly with the sqlContext after being registered. We also allow access to saved tables via the Hive Thrift server bundled with Spark. However, we would like to allow Hive connections to use