Re: Spark per app logging

2015-03-21 Thread Jeffrey Jedele
Hi, I'm not completely sure about this either, but this is what we are doing currently: Configure your logging to write to STDOUT, not to a file explicitely. Spark will capture stdour and stderr and separate the messages into a app/driver folder structure in the configured worker directory. We

Re: About the env of Spark1.2

2015-03-21 Thread sandeep vura
Make sure if you are using 127.0.0.1 please check in /etc/hosts and uncheck or create 127.0.1.1 named it as localhost On Sat, Mar 21, 2015 at 9:57 AM, Ted Yu yuzhih...@gmail.com wrote: bq. Caused by: java.net.UnknownHostException: dhcp-10-35-14-100: Name or service not known Can you check

Re: How to set Spark executor memory?

2015-03-21 Thread Xi Shen
Hi Sean, It's getting strange now. If I ran from IDE, my executor memory is always set to 6.7G, no matter what value I set in code. I have check my environment variable, and there's no value of 6.7, or 12.5 Any idea? Thanks, David On Tue, 17 Mar 2015 00:35 null jishnu.prat...@wipro.com wrote:

Re: Can I start multiple executors in local mode?

2015-03-21 Thread Xi Shen
No, I didn't mean local cluster. I mean run in local, like in IDE. On Mon, 16 Mar 2015 23:12 xu Peng hsxup...@gmail.com wrote: Hi David, You can try the local-cluster. the number in local-cluster[2,2,1024] represents that there are 2 worker, 2 cores and 1024M Best Regards Peng Xu

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Ted Yu
Mike: Once hadoop 2.7.0 is released, you should be able to enjoy the enhanced performance of s3a. See HADOOP-11571 Cheers On Sat, Mar 21, 2015 at 8:09 AM, Chris Fregly ch...@fregly.com wrote: hey mike! you'll definitely want to increase your parallelism by adding more shards to the stream -

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Chris Fregly
hey mike! you'll definitely want to increase your parallelism by adding more shards to the stream - as well as spinning up 1 receiver per shard and unioning all the shards per the KinesisWordCount example that is included with the kinesis streaming package.  you'll need more cores (cluster) or

'nested' RDD problem, advise needed

2015-03-21 Thread Michael Lewis
Hi, I wonder if someone can help suggest a solution to my problem, I had a simple process working using Strings and now want to convert to RDD[Char], the problem is when I end up with a nested call as follow: 1) Load a text file into an RDD[Char] val inputRDD =

Re: Spark Streaming Not Reading Messages From Multiple Kafka Topics

2015-03-21 Thread Jeffrey Jedele
Hey Eason! Weird problem indeed. More information will probably help to find te issue: Have you searched the logs for peculiar messages? How does your Spark environment look like? #workers, #threads, etc? Does it work if you create separate receivers for the topics? Regards, Jeff 2015-03-21

Re: How to set Spark executor memory?

2015-03-21 Thread Sean Owen
If you are running from your IDE, then I don't know what you are running or in what mode. The discussion here concerns using standard mechanisms like spark-submit to configure executor memory. Please try these first instead of trying to directly invoke Spark, which will require more understanding

Model deployment help

2015-03-21 Thread Shashidhar Rao
Hi, Apologies for the generic question. As I am developing predictive models for the first time and soon model will be deployed in production very soon. Could somebody help me with the model deployment in production , I have read quite a few on model deployment and have read some books on

Re: Spark 1.3 Dynamic Allocation - Requesting 0 new executor(s) because tasks are backlogged

2015-03-21 Thread Ted Yu
bq. Requesting 1 new executor(s) because tasks are backlogged 1 executor was requested. Which hadoop release are you using ? Can you check resource manager log to see if there is some clue ? Thanks On Fri, Mar 20, 2015 at 4:17 PM, Manoj Samel manojsamelt...@gmail.com wrote: Forgot to add -

Re: Upgrade from Spark 1.1.0 to 1.1.1+ Issues

2015-03-21 Thread Eason Hu
Thank you for your help Akhil! We found that it is no longer working from our laptop to remotely connect to the remote Spark cluster, but it works if the client is on the remote cluster as well, starting from the version 1.2.0 and beyond (v1.1.1 and below are fine). Not sure if this is related

Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)

2015-03-21 Thread Steve Loughran
1. make sure your secret key doesn't have a / in it. If it does, generate a new key. 2. jets3t and hadoop JAR versions need to be in sync; jets3t 0.9.0 was picked up in Hadoop 2.4 and not AFAIK 3. Hadoop 2.6 has a new S3 client, s3a, which compatible with s3n data. It uses the AWS toolkit

Re: saveAsTable broken in v1.3 DataFrames?

2015-03-21 Thread Michael Armbrust
I believe that you can get what you want by using HiveQL instead of the pure programatic API. This is a little verbose so perhaps a specialized function would also be useful here. I'm not sure I would call it saveAsExternalTable as there are also external spark sql data source tables that have

Spark streaming alerting

2015-03-21 Thread Mohit Anchlia
Is there a module in spark streaming that lets you listen to the alerts/conditions as they happen in the streaming module? Generally spark streaming components will execute on large set of clusters like hdfs or Cassandra, however when it comes to alerting you generally can't send it directly from

join two DataFrames, same column name

2015-03-21 Thread Eric Friedman
I have a couple of data frames that I pulled from SparkSQL and the primary key of one is a foreign key of the same name in the other. I'd rather not have to specify each column in the SELECT statement just so that I can rename this single column. When I try to join the data frames, I get an

Re: Did DataFrames break basic SQLContext?

2015-03-21 Thread Michael Armbrust
Now, I am not able to directly use my RDD object and have it implicitly become a DataFrame. It can be used as a DataFrameHolder, of which I could write: rdd.toDF.registerTempTable(foo) The rational here was that we added a lot of methods to DataFrame and made the implicits more

Re: How to set Spark executor memory?

2015-03-21 Thread Xi Shen
In the log, I saw MemoryStorage: MemoryStore started with capacity 6.7GB But I still can not find where to set this storage capacity. On Sat, 21 Mar 2015 20:30 Xi Shen davidshe...@gmail.com wrote: Hi Sean, It's getting strange now. If I ran from IDE, my executor memory is always set to

netlib-java cannot load native lib in Windows when using spark-submit

2015-03-21 Thread Xi Shen
Hi, I use the *OpenBLAS* DLL, and have configured my application to work in IDE. When I start my Spark application from IntelliJ IDE, I can see in the log that the native lib is loaded successfully. But if I use *spark-submit* to start my application, the native lib still cannot be load. I saw

Re: How to set Spark executor memory?

2015-03-21 Thread Xi Shen
Yeah, I think it is harder to troubleshot the properties issues in a IDE. But the reason I stick to IDE is because if I use spark-submit, the BLAS native cannot be loaded. May be I should open another thread to discuss that. Thanks, David On Sun, 22 Mar 2015 10:38 Xi Shen davidshe...@gmail.com

Reducing Spark's logging verbosity

2015-03-21 Thread Edmon Begoli
Hi, Does anyone have concrete recommendations how to reduce Spark's logging verbosity. We have attempted on several occasions to address this by setting various log4j properties, both in configuration property files and in $SPARK_HOME/conf/ spark-env.sh; however, all of those attempts have

Re: netlib-java cannot load native lib in Windows when using spark-submit

2015-03-21 Thread Ted Yu
Can you try the --driver-library-path option ? spark-submit --driver-library-path /opt/hadoop/lib/native ... Cheers On Sat, Mar 21, 2015 at 4:58 PM, Xi Shen davidshe...@gmail.com wrote: Hi, I use the *OpenBLAS* DLL, and have configured my application to work in IDE. When I start my Spark

Error while installing Spark 1.3.0 on local machine

2015-03-21 Thread HARIPRIYA AYYALASOMAYAJULA
Hello, I am trying to install Spark 1.3.0 on my mac. Earlier, I was working with Spark 1.1.0. Now, I come across this error : sbt.ResolveException: unresolved dependency: org.apache.spark#spark-network-common_2.10;1.3.0: configuration not public in

Re: How to set Spark executor memory?

2015-03-21 Thread Ted Yu
bq. the BLAS native cannot be loaded Have you tried specifying --driver-library-path option ? Cheers On Sat, Mar 21, 2015 at 4:42 PM, Xi Shen davidshe...@gmail.com wrote: Yeah, I think it is harder to troubleshot the properties issues in a IDE. But the reason I stick to IDE is because if I

Re: Model deployment help

2015-03-21 Thread Donald Szeto
Hi Shashidhar, Our team at PredictionIO is trying to solve the production deployment of model. We built a powered-by-Spark framework (also certified on Spark by Databricks) that allows a user to build models with everything available from the Spark API, persist the model automatically with

How to do nested foreach with RDD

2015-03-21 Thread Xi Shen
Hi, I have two big RDD, and I need to do some math against each pair of them. Traditionally, it is like a nested for-loop. But for RDD, it cause a nested RDD which is prohibited. Currently, I am collecting one of them, then do a nested for-loop, so to avoid nested RDD. But would like to know if