Re: A bug in spark or hadoop RPC with kerberos authentication?

2017-08-22 Thread 周康
you can checkout Hadoop**credential class in spark yarn。During spark submit,it will use config on the classpath. I wonder how do you reference your own config?

Spark Authentication on Mesos Client Mode

2017-08-22 Thread Kalvin Chau
Hi, Is it possible to use spark.authenticate with shared secrets in mesos client mode? I can get the driver to start up and expect to use authenticated channels, but when the executors start up the SecurityManager outputs "authentication: disabled". Looking through the

Re: Spark submit OutOfMemory Error in local mode

2017-08-22 Thread Naga G
Increase the cores, as you're trying to run multiple threads Sent from Naga iPad > On Aug 22, 2017, at 3:26 PM, "u...@moosheimer.com" > wrote: > > Since you didn't post any concrete information it's hard to give you an > advice. > > Try to increase the executor memory

[Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-08-22 Thread Karthik Palaniappan
I ran the HdfsWordCount example using this command: spark-submit run-example \ --conf spark.streaming.dynamicAllocation.enabled=true \ --conf spark.executor.instances=0 \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.master=yarn \ --conf spark.submit.deployMode=client \

Re: Spark submit OutOfMemory Error in local mode

2017-08-22 Thread u...@moosheimer.com
Since you didn't post any concrete information it's hard to give you an advice. Try to increase the executor memory (spark.executor.memory). If that doesn't help give all the experts in the community a chance to help you by adding more details like version, logfile, source etc Mit

[Streaming][Structured Streaming] Understanding dynamic allocation in streaming jobs

2017-08-22 Thread Karthik Palaniappan
I'm trying to understand dynamic allocation in Spark Streaming and Structured Streaming. It seems if you set spark.dynamicAllocation.enabled=true, both frameworks use Core's dynamic allocation algorithm -- request executors if the task backlog is a certain size, and remove executors if they

What is the right way to stop a streaming application?

2017-08-22 Thread shyla deshpande
Hi all, I am running a spark streaming application on AWS EC2 cluster in standalone mode. I am using DStreams and Spark 2.0.2 I do have the setting stopGracefullyOnShutdown to true. What is the right way to stop the streaming application. Thanks

Re: Spark submit OutOfMemory Error in local mode

2017-08-22 Thread shitijkuls
Any help here will be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-OutOfMemory-Error-in-local-mode-tp29081p29096.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Pierce Lamb
Hi Jake, There is a another option within the 3rd party projects in the spark database ecosystem that have combined Spark with a DBMS in such a way that DataFrame API has been extended to include UPDATE operations

A bug in spark or hadoop RPC with kerberos authentication?

2017-08-22 Thread Sun, Keith
Hello , I met this very weird issue, while easy to reproduce, and stuck me for more than 1 day .I suspect this may be an issue/bug related to the class loader. Can you help confirm the root cause ? I want to specify a customized Hadoop configuration set instead of those on the class path(we

Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Jake Russ
Hi Mich, Thank you for the explanation, that makes sense, and is helpful for me to understand the bigger picture between Spark/RDBMS. Happy to know I’m already following best practice. Cheers, Jake From: Mich Talebzadeh Date: Monday, August 21, 2017 at 6:44 PM To:

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-22 Thread Cody Koeninger
Kafka rdds need to start from a specified offset, you really don't want the executors just starting at whatever offset happened to be latest at the time they ran. If you need a way to figure out the latest offset at the time the driver starts up, you can always use a consumer to read the offsets

Fwd: ORC Transaction Table - Spark

2017-08-22 Thread Aviral Agarwal
Hi, I am trying to read hive orc transaction table through Spark but I am getting the following error Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo( OrcInputFormat.java:1021) at

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
Jorn, My question is not about the model type but instead, the spark capability on reusing any already trained ml model in training a new model. On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke wrote: > Is it really required to have one billion samples for just linear >

Re: UI for spark machine learning.

2017-08-22 Thread Jörn Franke
Is it really required to have one billion samples for just linear regression? Probably your model would do equally well with much less samples. Have you checked bias and variance if you use much less random samples? > On 22. Aug 2017, at 12:58, Sea aj wrote: > > I have a

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to train a linear regression model on the df but it failed due to lack of memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of CPU. I decided to split my data into multiple chunks and train the model in

Re: [SS] Why is a streaming aggregation required for complete output mode?

2017-08-22 Thread Jacek Laskowski
Hi, Thanks a lot Burak for the explanation! I appreciate it a lot (and promise to share the great news far and wide once I get the gist of the internals myself) What I miss is what part of Structured Streaming is responsible for enforcing the semantics of output modes. Once defined for a

Re: Scala closure exceeds ByteArrayOutputStream limit (~2gb)

2017-08-22 Thread Mungeol Heo
Hello, Joel. Have you solved the problem which is Java's 32-bit limit on array sizes? Thanks. On Wed, Jan 27, 2016 at 2:36 AM, Joel Keller wrote: > Hello, > > I am running RandomForest from mllib on a data-set which has very-high > dimensional data (~50k dimensions). > >

Re: SPARK Issue in Standalone cluster

2017-08-22 Thread Sea aj
Hi everyone, I have a huge dataframe with 1 billion rows and each row is a nested list. That being said, I want to train some ML models on this df but due to the huge size, I get out memory error on one of my nodes when I run fit function. currently, my configuration is: 144 cores, 16 cores for