Re: hive on spark query error

2015-09-25 Thread Marcelo Vanzin
Seems like you have "hive.server2.enable.doAs" enabled; you can either disable it, or configure hs2 so that the user running the service ("hadoop" in your case) can impersonate others. See: https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/Superusers.html On Fri, Sep 25,

Re: hive on spark query error

2015-09-25 Thread Marcelo Vanzin
On Fri, Sep 25, 2015 at 10:05 AM, Garry Chen wrote: > In spark-defaults.conf the spark.master is spark://hostname:7077. From > hive-site.xml > spark.master > hostname > That's not a valid value for spark.master (as the error indicates). You should set it to

Re: Yarn Shutting Down Spark Processing

2015-09-23 Thread Marcelo Vanzin
Did you look at your application's logs (using the "yarn logs" command?). That error means your application is failing to create a SparkContext. So either you have a bug in your code, or there will be some error in the log pointing at the actual reason for the failure. On Tue, Sep 22, 2015 at

Re: Yarn Shutting Down Spark Processing

2015-09-23 Thread Marcelo Vanzin
But that's not the complete application log. You say the streaming context is initialized, but can you show that in the logs? There's something happening that is causing the SparkContext to not be registered with the YARN backend, and that's why your application is being killed. If you can share

Re: Exception initializing JavaSparkContext

2015-09-21 Thread Marcelo Vanzin
What Spark package are you using? In particular, which hadoop version? On Mon, Sep 21, 2015 at 9:14 AM, ekraffmiller wrote: > Hi, > I’m trying to run a simple test program to access Spark though Java. I’m > using JDK 1.8, and Spark 1.5. I’m getting an Exception

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Marcelo Vanzin
On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett wrote: > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > 10.1.200.245): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at >

Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Marcelo Vanzin
Hi, Just "spark.executor.userClassPathFirst" is not enough. You should also set "spark.driver.userClassPathFirst". Also not that I don't think this was really tested with the shell, but that should work with regular apps started using spark-submit. If that doesn't work, I'd recommend shading, as

Re: Ranger-like Security on Spark

2015-09-03 Thread Marcelo Vanzin
On Thu, Sep 3, 2015 at 5:15 PM, Matei Zaharia wrote: > Even simple Spark-on-YARN should run as the user that submitted the job, > yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of > Ranger. It's slightly more complicated than that (without

Re: Does the driver program always run local to where you submit the job from?

2015-08-26 Thread Marcelo Vanzin
On Wed, Aug 26, 2015 at 2:03 PM, Jerry jerry.c...@gmail.com wrote: Assuming your submitting the job from terminal; when main() is called, if I try to open a file locally, can I assume the machine is always the one I submitted the job from? See the --deploy-mode option. client works as you

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-25 Thread Marcelo Vanzin
On Tue, Aug 25, 2015 at 10:48 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Now I am going to try it out on our mesos cluster. I assumed spark.executor.extraClassPath takes csv as jars the way --jars takes it but it should be : separated like a regular classpath jar. Ah, yes, those options

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-25 Thread Marcelo Vanzin
On Tue, Aug 25, 2015 at 1:50 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: So do I need to manually copy these 2 jars on my spark executors? Yes. I can think of a way to work around that if you're using YARN, but not with other cluster managers. On Tue, Aug 25, 2015 at 10:51 AM, Marcelo

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Marcelo Vanzin
Hi Utkarsh, Unfortunately that's not going to be easy. Since Spark bundles all dependent classes into a single fat jar file, to remove that dependency you'd need to modify Spark's assembly jar (potentially in all your nodes). Doing that per-job is even trickier, because you'd probably need some

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Marcelo Vanzin
/logging/src/main/java/com/opentable/logging/AssimilateForeignLogging.java#L68 Thanks, -Utkarsh On Mon, Aug 24, 2015 at 3:04 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Utkarsh, Unfortunately that's not going to be easy. Since Spark bundles all dependent classes into a single fat jar

Re: Exclude slf4j-log4j12 from the classpath via spark-submit

2015-08-24 Thread Marcelo Vanzin
On Mon, Aug 24, 2015 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: That didn't work since extraClassPath flag was still appending the jars at the end, so its still picking the slf4j jar provided by spark. Out of curiosity, how did you verify this? The extraClassPath options are

Re: build spark 1.4.1 with JDK 1.6

2015-08-21 Thread Marcelo Vanzin
That was only true until Spark 1.3. Spark 1.4 can be built with JDK7 and pyspark will still work. On Fri, Aug 21, 2015 at 8:29 AM, Chen Song chen.song...@gmail.com wrote: Thanks Sean. So how PySpark is supported. I thought PySpark needs jdk 1.6. Chen On Fri, Aug 21, 2015 at 11:16 AM, Sean

Re: Scala: How to match a java object????

2015-08-18 Thread Marcelo Vanzin
On Tue, Aug 18, 2015 at 12:59 PM, saif.a.ell...@wellsfargo.com wrote: 5 match { case java.math.BigDecimal = 2 } 5 match { case _: java.math.BigDecimal = 2 } -- Marcelo - To unsubscribe, e-mail:

Re: Scala: How to match a java object????

2015-08-18 Thread Marcelo Vanzin
the typed pattern example. -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Tuesday, August 18, 2015 5:15 PM To: Ellafi, Saif A. Cc: wrbri...@gmail.com; user@spark.apache.org Subject: Re: Scala: How to match a java object On Tue, Aug 18, 2015 at 12:59 PM

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Marcelo Vanzin
On Fri, Aug 14, 2015 at 2:11 PM, Varadhan, Jawahar varad...@yahoo.com.invalid wrote: And hence, I was planning to use Spark Streaming with Kafka or Flume with Kafka. But flume runs on a JVM and may not be the best option as the huge file will create memory issues. Please suggest someway to

Re: Contributors group and starter task

2015-08-03 Thread Marcelo Vanzin
Hi Namit, There's no need to assign a bug to yourself to say you're working on it. The recommended way is to just post a PR on github - the bot will update the bug saying that you have a patch open to fix the issue. On Mon, Aug 3, 2015 at 3:50 PM, Namit Katariya katariya.na...@gmail.com wrote:

Re: Topology.py -- Cannot run on Spark Gateway on Cloudera 5.4.4.

2015-08-03 Thread Marcelo Vanzin
That should not be a fatal error, it's just a noisy exception. Anyway, it should go away if you add YARN gateways to those nodes (aside from Spark gateways). On Mon, Aug 3, 2015 at 7:10 PM, Upen N ukn...@gmail.com wrote: Hi, I recently installed Cloudera CDH 5.4.4. Sparks comes shipped with

Re: No event logs in yarn-cluster mode

2015-08-01 Thread Marcelo Vanzin
On Sat, Aug 1, 2015 at 9:25 AM, Akmal Abbasov akmal.abba...@icloud.com wrote: When I running locally(./run-example SparkPi), the event logs are being created, and I can start history server. But when I am trying ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster

Re: How to add multiple sequence files from HDFS to a Spark Context to do Batch processing?

2015-07-31 Thread Marcelo Vanzin
file can be a directory (look at all children) or even a glob (/path/*.ext, for example). On Fri, Jul 31, 2015 at 11:35 AM, swetha swethakasire...@gmail.com wrote: Hi, How to add multiple sequence files from HDFS to a Spark Context to do Batch processing? I have something like the following

Re: Problem submiting an script .py against an standalone cluster.

2015-07-30 Thread Marcelo Vanzin
Can you share the part of the code in your script where you create the SparkContext instance? On Thu, Jul 30, 2015 at 7:19 PM, fordfarline fordfarl...@gmail.com wrote: Hi All, I`m having an issue when lanching an app (python) against a stand alone cluster, but runs in local, as it doesn't

Re: NO Cygwin Support in bin/spark-class in Spark 1.4.0

2015-07-28 Thread Marcelo Vanzin
Can you run the windows batch files (e.g. spark-submit.cmd) from the cygwin shell? On Tue, Jul 28, 2015 at 7:26 PM, Proust GZ Feng pf...@cn.ibm.com wrote: Hi, Owen Add back the cygwin classpath detection can pass the issue mentioned before, but there seems lack of further support in the

Re: Which directory contains third party libraries for Spark

2015-07-28 Thread Marcelo Vanzin
Hi Stephen, There is no such directory currently. If you want to add an existing jar to every app's classpath, you need to modify two config values: spark.driver.extraClassPath and spark.executor.extraClassPath. On Mon, Jul 27, 2015 at 10:22 PM, Stephen Boesch java...@gmail.com wrote: when

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Marcelo Vanzin
This might be an issue with how pyspark propagates the error back to the AM. I'm pretty sure this does not happen for Scala / Java apps. Have you filed a bug? On Tue, Jul 28, 2015 at 11:17 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Corey for your answer, Do you mean that final

Re: [ Potential bug ] Spark terminal logs say that job has succeeded even though job has failed in Yarn cluster mode

2015-07-28 Thread Marcelo Vanzin
BTW this is most probably caused by this line in PythonRunner.scala: System.exit(process.waitFor()) The YARN backend doesn't like applications calling System.exit(). On Tue, Jul 28, 2015 at 12:00 PM, Marcelo Vanzin van...@cloudera.com wrote: This might be an issue with how pyspark

Re: Spark 1.4.0 compute-classpath.sh

2015-07-15 Thread Marcelo Vanzin
That has never been the correct way to set you app's classpath. Instead, look at http://spark.apache.org/docs/latest/configuration.html and search for extraClassPath. On Wed, Jul 15, 2015 at 9:43 AM, lokeshkumar lok...@dataken.net wrote: Hi forum I have downloaded the latest spark version

Re: Spark and HDFS

2015-07-15 Thread Marcelo Vanzin
On Wed, Jul 15, 2015 at 5:36 AM, Jeskanen, Elina elina.jeska...@cgi.com wrote: I have Spark 1.4 on my local machine and I would like to connect to our local 4 nodes Cloudera cluster. But how? In the example it says text_file = spark.textFile(hdfs://...), but can you advise me in where to

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora shushantaror...@gmail.com wrote: When I specify --executor-cores 4 it fails to start the application. When I give --executor-cores as 4 , it works fine. Do you have any NM that advertises more than 4 available cores? Also, it's always worth it

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 11:13 AM, Shushant Arora shushantaror...@gmail.com wrote: spark-submit --class classname --num-executors 10 --executor-cores 4 --master masteradd jarname Will it allocate 10 containers throughout the life of streaming application on same nodes until any node failure

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 12:03 PM, Shushant Arora shushantaror...@gmail.com wrote: Can a container have multiple JVMs running in YARN? Yes and no. A container runs a single command, but that process can start other processes, and those also count towards the resource usage of the container

Re: Why does SparkSubmit process takes so much virtual memory in yarn-cluster mode ?

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 3:42 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: I looked into Virtual memory usage (jmap+jvisualvm) does not show that 11.5 g Virtual Memory usage - it is much less. I get 11.5 g Virtual memory usage using top -p pid command for SparkSubmit process. If you're

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora shushantaror...@gmail.com wrote: My understanding was --executor-cores(5 here) are maximum concurrent tasks possible in an executor and --num-executors (10 here)are no of executors or containers demanded by Application master/Spark driver

Re: Why does SparkSubmit process takes so much virtual memory in yarn-cluster mode ?

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov elkhan8...@gmail.com wrote: While the program is running, these are the stats of how much memory each process takes: SparkSubmit process : 11.266 *gigabyte* Virtual Memory ApplicationMaster process: 2303480 *byte *Virtual Memory That

Re: spark on yarn

2015-07-14 Thread Marcelo Vanzin
On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora shushantaror...@gmail.com wrote: Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per container? I don't remember YARN config names by heart, but that sounds promising. I'd look at the YARN documentation for details.

Re: Pyspark not working on yarn-cluster mode

2015-07-09 Thread Marcelo Vanzin
You cannot run Spark in cluster mode by instantiating a SparkContext like that. You have to launch it with the spark-submit command line script. On Thu, Jul 9, 2015 at 2:23 PM, jegordon jgordo...@gmail.com wrote: Hi to all, Is there any way to run pyspark scripts with yarn-cluster mode

Re: is it possible to disable -XX:OnOutOfMemoryError=kill %p for the executors?

2015-07-07 Thread Marcelo Vanzin
SIGTERM on YARN generally means the NM is killing your executor because it's running over its requested memory limits. Check your NM logs to make sure. And then take a look at the memoryOverhead setting for driver and executors (http://spark.apache.org/docs/latest/running-on-yarn.html). On Tue,

Re: Problem after enabling Hadoop native libraries

2015-06-26 Thread Marcelo Vanzin
What master are you using? If this is not a local master, you'll need to set LD_LIBRARY_PATH on the executors also (using spark.executor.extraLibraryPath). If you are using local, then I don't know what's going on. On Fri, Jun 26, 2015 at 1:39 AM, Arunabha Ghosh arunabha...@gmail.com wrote:

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Marcelo Vanzin
with Mesos. On Fri, Jun 26, 2015 at 1:20 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen t...@mesosphere.io wrote: So correct me if I'm wrong, sounds like all you need is a principal user name and also a keytab file downloaded right? I'm not familiar

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Marcelo Vanzin
On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen t...@mesosphere.io wrote: So correct me if I'm wrong, sounds like all you need is a principal user name and also a keytab file downloaded right? I'm not familiar with Mesos so don't know what kinds of features it has, but at the very least it would

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Marcelo Vanzin
On Fri, Jun 26, 2015 at 3:09 PM, Dave Ariens dari...@blackberry.com wrote: Would there be any way to have the task instances in the slaves call the UGI login with a principal/keytab provided to the driver? That would only work with a very small number of executors. If you have many login

Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

2015-06-26 Thread Marcelo Vanzin
. You can check the Hadoop sources for details. Not sure if there's another way. *From: *Marcelo Vanzin *Sent: *Friday, June 26, 2015 6:20 PM *To: *Dave Ariens *Cc: *Tim Chen; Olivier Girardot; user@spark.apache.org *Subject: *Re: Accessing Kerberos Secured HDFS Resources from Spark

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Marcelo Vanzin
, Marcelo Vanzin van...@cloudera.com wrote: That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Marcelo Vanzin
That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files

Re: how to use a properties file from a url in spark-submit

2015-06-11 Thread Marcelo Vanzin
That's not supported. You could use wget / curl to download the file to a temp location before running spark-submit, though. On Thu, Jun 11, 2015 at 12:48 PM, Gary Ogden gog...@gmail.com wrote: I have a properties file that is hosted at a url. I would like to be able to use the url in the

Re: spark uses too much memory maybe (binaryFiles() with more than 1 million files in HDFS), groupBy or reduceByKey()

2015-06-10 Thread Marcelo Vanzin
So, I don't have an explicit solution to your problem, but... On Wed, Jun 10, 2015 at 7:13 AM, Kostas Kougios kostas.koug...@googlemail.com wrote: I am profiling the driver. It currently has 564MB of strings which might be the 1mil file names. But also it has 2.34 GB of long[] ! That's so

Re: PYTHONPATH on worker nodes

2015-06-10 Thread Marcelo Vanzin
I don't think it's propagated automatically. Try this: spark-submit --conf spark.executorEnv.PYTHONPATH=... ... On Wed, Jun 10, 2015 at 8:15 AM, Bob Corsaro rcors...@gmail.com wrote: I'm setting PYTHONPATH before calling pyspark, but the worker nodes aren't inheriting it. I've tried looking

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
If your application is stuck in that state, it generally means your cluster doesn't have enough resources to start it. In the RM logs you can see how many vcores / memory the application is asking for, and then you can check your RM configuration to see if that's currently available on any single

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
this. On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote: Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now - this problem is specific to Spark. That doesn't necessarily

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
, it's broken for good. On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com wrote: Apologies, I see you already posted everything from the RM logs that mention your stuck app. Have you tried restarting the YARN cluster to see if that changes anything? Does it go back

Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote: Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now - this problem is specific to Spark. That doesn't necessarily mean anything. Spark apps have different resource requirements than Hadoop apps.

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote: Initially I had issues passing the SparkContext to other threads as it is not serializable. Eventually I found that adding the @transient annotation prevents a NotSerializableException. This is really puzzling. How are

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
Ignoring the serialization thing (seems like a red herring): On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden splee...@gmail.com wrote: 15/06/05 11:35:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoSuchMethodError:

Re: SparkContext Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 12:55 PM, Lee McFadden splee...@gmail.com wrote: Regarding serialization, I'm still confused as to why I was getting a serialization error in the first place as I'm executing these Runnable classes from a java thread pool. I'm fairly new to Scala/JVM world and there

Re: Problem reading Parquet from 1.2 to 1.3

2015-06-04 Thread Marcelo Vanzin
I talked to Don outside the list and he says that he's seeing this issue with Apache Spark 1.3 too (not just CDH Spark), so it seems like there is a real issue here. On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote: As part of upgrading a cluster from CDH 5.3.x to CDH 5.4.x I

Re: Problem reading Parquet from 1.2 to 1.3

2015-06-03 Thread Marcelo Vanzin
(bcc: user@spark, cc:cdh-user@cloudera) If you're using CDH, Spark SQL is currently unsupported and mostly untested. I'd recommend trying to use it in CDH. You could try an upstream version of Spark instead. On Wed, Jun 3, 2015 at 1:39 PM, Don Drake dondr...@gmail.com wrote: As part of

Re: Spark 1.4 YARN Application Master fails with 500 connect refused

2015-06-02 Thread Marcelo Vanzin
That code hasn't changed at all between 1.3 and 1.4; it also has been working fine for me. Are you sure you're using exactly the same Hadoop libraries (since you're building with -Phadoop-provided) and Hadoop configuration in both cases? On Tue, Jun 2, 2015 at 5:29 PM, Night Wolf

Re: View all user's application logs in history server

2015-05-27 Thread Marcelo Vanzin
You may be the only one not seeing all the logs. Are you sure all the users are writing to the same log directory? The HS can only read from a single log directory. On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: No one using History server? :) Am I the only one

Re: View all user's application logs in history server

2015-05-27 Thread Marcelo Vanzin
, Jianshi Huang jianshi.hu...@gmail.com wrote: Yes, all written to the same directory on HDFS. Jianshi On Wed, May 27, 2015 at 11:57 PM, Marcelo Vanzin van...@cloudera.com wrote: You may be the only one not seeing all the logs. Are you sure all the users are writing to the same log directory

Re: Running Javascript from scala spark

2015-05-26 Thread Marcelo Vanzin
Is it just me or does that look completely unrelated to Spark-the-Apache-project? On Tue, May 26, 2015 at 10:55 AM, Ted Yu yuzhih...@gmail.com wrote: Have you looked at https://github.com/spark/sparkjs ? Cheers On Tue, May 26, 2015 at 10:17 AM, marcos rebelo ole...@gmail.com wrote: Hi

Re: Spark HistoryServer not coming up

2015-05-21 Thread Marcelo Vanzin
Seems like there might be a mismatch between your Spark jars and your cluster's HDFS version. Make sure you're using the Spark jar that matches the hadoop version of your cluster. On Thu, May 21, 2015 at 8:48 AM, roy rp...@njit.edu wrote: Hi, After restarting Spark HistoryServer, it failed

Re: --jars works in yarn-client but not yarn-cluster mode, why?

2015-05-20 Thread Marcelo Vanzin
$RemotingTerminator: Shutting down remote daemon. 15/05/19 14:10:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/05/19 14:10:47 INFO spark.SparkContext: Successfully stopped SparkContext 2015-05-19 1:12 GMT+08:00 Marcelo Vanzin

Re: py-files (and others?) not properly set up in cluster-mode Spark Yarn job?

2015-05-18 Thread Marcelo Vanzin
Hi Shay, Yeah, that seems to be a bug; it doesn't seem to be related to the default FS nor compareFs either - I can reproduce this with HDFS when copying files from the local fs too. In yarn-client mode things seem to work. Could you file a bug to track this? If you don't have a jira account I

Re: Spark's Guava pieces cause exceptions in non-trivial deployments

2015-05-15 Thread Marcelo Vanzin
exactly the same as SPARK_CLASSPATH. It would be nice to know whether that is also the case in 1.4 (I took a quick look at the related code and it seems correct), but I don't have Mesos around to test. On Fri, May 15, 2015 at 12:04 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, May

Re: Spark's Guava pieces cause exceptions in non-trivial deployments

2015-05-15 Thread Marcelo Vanzin
if those options worked differently from SPARK_CLASSPATH, since they were meant to replace it. On Fri, May 15, 2015 at 11:54 AM, Marcelo Vanzin van...@cloudera.com wrote: Ah, I see. yeah, it sucks that Spark has to expose Optional (and things it depends on), but removing that would break

Re: Spark's Guava pieces cause exceptions in non-trivial deployments

2015-05-14 Thread Marcelo Vanzin
What version of Spark are you using? The bug you mention is only about the Optional class (and a handful of others, but none of the classes you're having problems with). All other Guava classes should be shaded since Spark 1.2, so you should be able to use your own version of Guava with no

Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Marcelo Vanzin
Are you actually running anything that requires all those slots? e.g., locally, I get this with local[16], but only after I run something that actually uses those 16 slots: Executor task launch worker-15 daemon prio=10 tid=0x7f4c80029800 nid=0x8ce waiting on condition [0x7f4c62493000]

Re: spark : use the global config variables in executors

2015-05-11 Thread Marcelo Vanzin
Note that `object` is equivalent to a class full of static fields / methods (in Java), so the data it holds will not be serialized, ever. What you want is a config class instead, so you can instantiate it, and that instance can be serialized. Then you can easily do (1) or (3). On Mon, May 11,

Re: history server

2015-05-07 Thread Marcelo Vanzin
-07:00 Marcelo Vanzin van...@cloudera.com: Can you get a jstack for the process? Maybe it's stuck somewhere. On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote: i am trying to launch the spark 1.3.1 history server on a secure cluster. i can see in the logs

Re: history server

2015-05-07 Thread Marcelo Vanzin
(Interpreted frame) On Thu, May 7, 2015 at 2:17 PM, Koert Kuipers ko...@tresata.com wrote: good idea i will take a look. it does seem to be spinning one cpu at 100%... On Thu, May 7, 2015 at 2:03 PM, Marcelo Vanzin van...@cloudera.com wrote: Can you get a jstack for the process? Maybe it's stuck

Re: history server

2015-05-07 Thread Marcelo Vanzin
Can you get a jstack for the process? Maybe it's stuck somewhere. On Thu, May 7, 2015 at 11:00 AM, Koert Kuipers ko...@tresata.com wrote: i am trying to launch the spark 1.3.1 history server on a secure cluster. i can see in the logs that it successfully logs into kerberos, and it is

Re: SparkSQL issue: Spark 1.3.1 + hadoop 2.6 on CDH5.3 with parquet

2015-05-07 Thread Marcelo Vanzin
On Thu, May 7, 2015 at 7:39 PM, felicia shsh...@tsmc.com wrote: we tried to add /usr/lib/parquet/lib /usr/lib/parquet to SPARK_CLASSPATH and it doesn't seems to work, To add the jars to the classpath you need to use /usr/lib/parquet/lib/*, otherwise you're just adding the directory (and not

Re: what does Container exited with a non-zero exit code 10 means?

2015-05-05 Thread Marcelo Vanzin
What Spark tarball are you using? You may want to try the one for hadoop 2.6 (the one for hadoop 2.4 may cause that issue, IIRC). On Tue, May 5, 2015 at 6:54 PM, felicia shsh...@tsmc.com wrote: Hi all, We're trying to implement SparkSQL on CDH5.3.0 with cluster mode, and we get this error

Re: JAVA_HOME problem

2015-04-28 Thread Marcelo Vanzin
Are you using a Spark build that matches your YARN cluster version? That seems like it could happen if you're using a Spark built against a newer version of YARN than you're running. On Thu, Apr 2, 2015 at 12:53 AM, 董帅阳 917361...@qq.com wrote: spark 1.3.0 spark@pc-zjqdyyn1:~ tail

Re: How to debug Spark on Yarn?

2015-04-24 Thread Marcelo Vanzin
On top of what's been said... On Wed, Apr 22, 2015 at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: 1) I can go to Spark UI and see the status of the APP but cannot see the logs as the job progresses. How can i see logs of executors as they progress ? Spark 1.3 should have links to the

Re: Spark SQL - Setting YARN Classpath for primordial class loader

2015-04-23 Thread Marcelo Vanzin
You'd have to use spark.{driver,executor}.extraClassPath to modify the system class loader. But that also means you have to manually distribute the jar to the nodes in your cluster, into a common location. On Thu, Apr 23, 2015 at 6:38 PM, Night Wolf nightwolf...@gmail.com wrote: Hi guys,

Re: Spark SQL - Setting YARN Classpath for primordial class loader

2015-04-23 Thread Marcelo Vanzin
No, those have to be local paths. On Thu, Apr 23, 2015 at 6:53 PM, Night Wolf nightwolf...@gmail.com wrote: Thanks Marcelo, can this be a path on HDFS? On Fri, Apr 24, 2015 at 11:52 AM, Marcelo Vanzin van...@cloudera.com wrote: You'd have to use spark.{driver,executor}.extraClassPath

Re: spark.dynamicAllocation.minExecutors

2015-04-16 Thread Marcelo Vanzin
I think Michael is referring to this: Exception in thread main java.lang.IllegalArgumentException: You must specify at least 1 executor! Usage: org.apache.spark.deploy.yarn.Client [options] spark-submit --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=0

Re: Spark Job #of attempts ?

2015-04-09 Thread Marcelo Vanzin
Set spark.yarn.maxAppAttempts=1 if you don't want retries. On Thu, Apr 9, 2015 at 10:31 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Hello, I have a spark job with 5 stages. After it runs 3rd stage, the console shows 15/04/09 10:25:57 INFO yarn.Client: Application report for

Re: Error running Spark on Cloudera

2015-04-08 Thread Marcelo Vanzin
spark.eventLog.dir should contain the full HDFS URL. In general, this should be sufficient: spark.eventLog.dir=hdfs:/user/spark/applicationHistory On Wed, Apr 8, 2015 at 6:45 AM, Vijayasarathy Kannan kvi...@vt.edu wrote: I am trying to run a Spark application using spark-submit on a cluster

Re: A problem with Spark 1.3 artifacts

2015-04-07 Thread Marcelo Vanzin
BTW, just out of curiosity, I checked both the 1.3.0 release assembly and the spark-core_2.10 artifact downloaded from http://mvnrepository.com/, and neither contain any references to anything under org.eclipse (all referenced jetty classes are the shaded ones under org.spark-project.jetty). On

Re: Can not get executor's Log from Spark's History Server

2015-04-07 Thread Marcelo Vanzin
The Spark history server does not have the ability to serve executor logs currently. You need to use the yarn logs command for that. On Tue, Apr 7, 2015 at 2:51 AM, donhoff_h 165612...@qq.com wrote: Hi, Experts I run my Spark Cluster on Yarn. I used to get executors' Logs from Spark's History

Re: A problem with Spark 1.3 artifacts

2015-04-07 Thread Marcelo Vanzin
Maybe you have some sbt-built 1.3 version in your ~/.ivy2/ directory that's masking the maven one? That's the only explanation I can come up with... On Tue, Apr 7, 2015 at 12:22 PM, Jacek Lewandowski jacek.lewandow...@datastax.com wrote: So weird, as I said - I created a new empty project

Re: Spark-events does not exist error, while it does with all the req. rights

2015-04-02 Thread Marcelo Vanzin
FYI I wrote a small test to try to reproduce this, and filed SPARK-6688 to track the fix. On Tue, Mar 31, 2015 at 1:15 PM, Marcelo Vanzin van...@cloudera.com wrote: Hmmm... could you try to set the log dir to file:/home/hduser/spark/spark-events? I checked the code and it might be the case

Re: Unable to run Spark application

2015-04-01 Thread Marcelo Vanzin
Try sbt assembly instead. On Wed, Apr 1, 2015 at 10:09 AM, Vijayasarathy Kannan kvi...@vt.edu wrote: Why do I get Failed to find Spark assembly JAR. You need to build Spark before running this program. ? I downloaded spark-1.2.1.tgz from the downloads page and extracted it. When I do sbt

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-31 Thread Marcelo Vanzin
a text file, closed it an viewed it, and deleted it (iii). My findings were reconfirmed by my colleague. Any other ideas? Thanks, Tom On 30 March 2015 at 19:19, Marcelo Vanzin van...@cloudera.com wrote: So, the error below is still showing the invalid configuration. You mentioned

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin
Are those config values in spark-defaults.conf? I don't think you can use ~ there - IIRC it does not do any kind of variable expansion. On Mon, Mar 30, 2015 at 3:50 PM, Tom thubregt...@gmail.com wrote: I have set spark.eventLog.enabled true as I try to preserve log files. When I run, I get

Re: Spark 1.3.0 Build Failure

2015-03-30 Thread Marcelo Vanzin
This sounds like SPARK-6532. On Mon, Mar 30, 2015 at 1:34 PM, ARose ashley.r...@telarix.com wrote: So, I am trying to build Spark 1.3.0 (standalone mode) on Windows 7 using Maven, but I'm getting a build failure. java -version java version 1.8.0_31 Java(TM) SE Runtime Environment (build

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin
and spark-env: Log directory /home/hduser/spark/spark-events does not exist. (Also, in the default /tmp/spark-events it also did not work) On 30 March 2015 at 18:03, Marcelo Vanzin van...@cloudera.com wrote: Are those config values in spark-defaults.conf? I don't think you can use ~ there - IIRC

Re: Spark-events does not exist error, while it does with all the req. rights

2015-03-30 Thread Marcelo Vanzin
So, the error below is still showing the invalid configuration. You mentioned in the other e-mails that you also changed the configuration, and that the directory really, really exists. Given the exception below, the only ways you'd get the error with a valid configuration would be if (i) the

Re: Spark History Server : jobs link doesn't open

2015-03-26 Thread Marcelo Vanzin
bcc: user@, cc: cdh-user@ I recommend using CDH's mailing list whenever you have a problem with CDH. That being said, you haven't provided enough info to debug the problem. Since you're using CM, you can easily go look at the History Server's logs and see what the underlying error is. On Thu,

Re: Spark shell never leaves ACCEPTED state in YARN CDH5

2015-03-25 Thread Marcelo Vanzin
The probably means there are not enough free resources in your cluster to run the AM for the Spark job. Check your RM's web ui to see the resources you have available. On Wed, Mar 25, 2015 at 12:08 PM, Khandeshi, Ami ami.khande...@fmr.com.invalid wrote: I am seeing the same behavior. I have

Re: Does HiveContext connect to HiveServer2?

2015-03-24 Thread Marcelo Vanzin
spark-submit --files /path/to/hive-site.xml On Tue, Mar 24, 2015 at 10:31 AM, Udit Mehta ume...@groupon.com wrote: Another question related to this, how can we propagate the hive-site.xml to all workers when running in the yarn cluster mode? On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin

Re: Does HiveContext connect to HiveServer2?

2015-03-24 Thread Marcelo Vanzin
It does neither. If you provide a Hive configuration to Spark, HiveContext will connect to your metastore server, otherwise it will create its own metastore in the working directory (IIRC). On Tue, Mar 24, 2015 at 8:58 AM, nitinkak001 nitinkak...@gmail.com wrote: I am wondering if HiveContext

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-24 Thread Marcelo Vanzin
/spark-submit --class App1 --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true $HOME/projects/sparkapp/target/scala-2.10/sparkapp-assembly-1.0.jar Thanks, Alexey On Tue, Mar 24, 2015 at 5:03 AM, Marcelo Vanzin van...@cloudera.com wrote: You could build

Re: Invalid ContainerId ... Caused by: java.lang.NumberFormatException: For input string: e04

2015-03-24 Thread Marcelo Vanzin
Hi there, On Tue, Mar 24, 2015 at 1:40 PM, Manoj Samel manojsamelt...@gmail.com wrote: When I run any query, it gives java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; Are you running a custom-compiled Spark by any chance?

Re: FAILED SelectChannelConnector@0.0.0.0:4040 java.net.BindException: Address already in use

2015-03-24 Thread Marcelo Vanzin
Does your application actually fail? That message just means there's another application listening on that port. Spark should try to bind to a different one after that and keep going. On Tue, Mar 24, 2015 at 12:43 PM, , Roy rp...@njit.edu wrote: I get following message for each time I run spark

Re: Spark 1.3 Dynamic Allocation - Requesting 0 new executor(s) because tasks are backlogged

2015-03-23 Thread Marcelo Vanzin
On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel manojsamelt...@gmail.com wrote: Found the issue above error - the setting for spark_shuffle was incomplete. Now it is able to ask and get additional executors. The issue is once they are released, it is not able to proceed with next query. That

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Marcelo Vanzin
You could build a far jar for your application containing both your code and the json4s library, and then run Spark with these two options: spark.driver.userClassPathFirst=true spark.executor.userClassPathFirst=true Both only work in 1.3. (1.2 has spark.files.userClassPathFirst, but that

<    1   2   3   4   5   >