spark.sql.autoBroadcastJoinThreshold not taking effect

2019-05-10 Thread V0lleyBallJunki3
Hello, I have set spark.sql.autoBroadcastJoinThreshold=1GB and I am running the spark job. However, my application is failing with: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at

Re: ml Pipeline read write

2019-05-10 Thread Koert Kuipers
i guess it simply is never set, in which case it is created in: protected final def sparkSession: SparkSession = { if (optionSparkSession.isEmpty) { optionSparkSession = Some(SparkSession.builder().getOrCreate()) } optionSparkSession.get } On Fri, May 10, 2019 at 4:31 PM

spark error when initializing spark session in java

2019-05-10 Thread Serena S Yuan
Hi, When I run the following code within a bigger function there is an error SparkConf sparkConf = new SparkConf().setAppName("ContactListenerExample").setMaster("local[2]").set("spark.executor.memory","1g"); SparkContext sc = new SparkContext(sparkConf); Here is the error:

ml Pipeline read write

2019-05-10 Thread Koert Kuipers
i am trying to understand how ml persists pipelines. it seems a SparkSession or SparkContext is needed for this, to write to hdfs. MLWriter and MLReader both extend BaseReadWrite to have access to a SparkSession. but this is where it gets confusing... the only way to set the SparkSession seems to

Re: Question about SaveMode.Ignore behaviour

2019-05-10 Thread Juho Autio
Never mind, I noticed that mode=ignore doesn't write anything if target path exists. Even if files previously exist only in different partitions than the one's being written to. So, ignore mode can't be used to mitigate the FileAlreadyExistsException problem of append mode.. On Thu, May 9, 2019

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
sure NP. I meant these topics [image: image.png] Have a look at this article of mine https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ under section Understanding the Spark Application Through Visualization See if it helps HTH Dr Mich

Re: Spark on yarn - application hangs

2019-05-10 Thread Mkal
How can i check what exactly is stagnant? Do you mean on the DAG visualization on Spark UI? Sorry i'm new to spark. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark on yarn - application hangs

2019-05-10 Thread Mich Talebzadeh
Hi, Have you checked matrices from Spark UI by any chance? What is stagnant? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Spark on yarn - application hangs

2019-05-10 Thread Mkal
I've built a spark job in which an external program is called through the use of pipe(). Job runs correctly on cluster when the input is a small sample dataset but when the input is a real large dataset it stays on RUNNING state forever. I've tried different ways to tune executor memory, executor

Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-10 Thread V0lleyBallJunki3
So what I discovered was that if I write the table being joined to the disk and then read it again Spark correctly broadcasts it. I think it is because when Spark estimates the size of smaller table it estimates it incorrectly to be much bigger that what it is and hence decides to do a

Spark Elasticsearch Connector | Index and Update

2019-05-10 Thread Akshay Bhardwaj
Hi All, Are there any users who have integrated spark structured streaming with elastic search 6.x? In elastic search doc, ElasticSearch Hadoop Configuration There is a property es.write.option which is defined as