Re: Invalid ContainerId ... Caused by: java.lang.NumberFormatException: For input string: e04

2015-03-23 Thread Marcelo Vanzin
This happens most probably because the Spark 1.3 you have downloaded is built against an older version of the Hadoop libraries than those used by CDH, and those libraries cannot parse the container IDs generated by CDH. You can try to work around this by manually adding CDH jars to the front of

Re: WebUI on yarn through ssh tunnel affected by AmIpfilter

2015-03-20 Thread Marcelo Vanzin
Instead of opening a tunnel to the Spark web ui port, could you open a tunnel to the YARN RM web ui instead? That should allow you to navigate to the Spark application's web ui through the RM proxy, and hopefully that will work better. On Fri, Feb 6, 2015 at 9:08 PM, yangqch

Re: spark there is no space on the disk

2015-03-19 Thread Marcelo Vanzin
IIRC you have to set that configuration on the Worker processes (for standalone). The app can't override it (only for a client-mode driver). YARN has a similar configuration, but I don't know the name (shouldn't be hard to find, though). On Thu, Mar 19, 2015 at 11:56 AM, Davies Liu

Re: Spark Job History Server

2015-03-18 Thread Marcelo Vanzin
Those classes are not part of standard Spark. You may want to contact Hortonworks directly if they're suggesting you use those. On Wed, Mar 18, 2015 at 3:30 AM, patcharee patcharee.thong...@uni.no wrote: Hi, I am using spark 1.3. I would like to use Spark Job History Server. I added the

Re: Using a different spark jars than the one on the cluster

2015-03-18 Thread Marcelo Vanzin
Since you're using YARN, you should be able to download a Spark 1.3.0 tarball from Spark's website and use spark-submit from that installation to launch your app against the YARN cluster. So effectively you would have 1.2.0 and 1.3.0 side-by-side in your cluster. On Wed, Mar 18, 2015 at 11:09

Re: InvalidAuxServiceException in dynamicAllocation

2015-03-17 Thread Marcelo Vanzin
I assume you're running YARN given the exception. I don't know if this is covered in the documentation (I took a quick look at the config document and didn't see references to it), but you need to configure Spark's external shuffle service as and auxiliary nodemanager service in your YARN

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin
+ ALT + V for copying commands in the shell) and that results in closing my shell. In order to solve this I was wondering if I just deactivating the CTRL + C combination at all! Any ideas? // Adamantios On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote: You can type

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin
commands in the shell) and that results in closing my shell. In order to solve this I was wondering if I just deactivating the CTRL + C combination at all! Any ideas? // Adamantios On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote: You can type :quit. On Fri

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Marcelo Vanzin
You can type :quit. On Fri, Mar 13, 2015 at 10:29 AM, Adamantios Corais adamantios.cor...@gmail.com wrote: Hi, I want change the default combination of keys that exit the Spark shell (i.e. CTRL + C) to something else, such as CTRL + H? Thank you in advance. // Adamantios -- Marcelo

Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread Marcelo Vanzin
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath,

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Marcelo Vanzin
It seems from the excerpt below that your cluster is set up to use the Yarn ATS, and the code is failing in that path. I think you'll need to apply the following patch to your Spark sources if you want this to work: https://github.com/apache/spark/pull/3938 On Thu, Mar 5, 2015 at 10:04 AM, Todd

Re: Building Spark 1.3 for Scala 2.11 using Maven

2015-03-05 Thread Marcelo Vanzin
I've never tried it, but I'm pretty sure in the very least you want -Pscala-2.11 (not -D). On Thu, Mar 5, 2015 at 4:46 PM, Night Wolf nightwolf...@gmail.com wrote: Hey guys, Trying to build Spark 1.3 for Scala 2.11. I'm running with the folllowng Maven command; -DskipTests -Dscala-2.11

Re: Building Spark 1.3 for Scala 2.11 using Maven

2015-03-05 Thread Marcelo Vanzin
Ah, and you may have to use dev/change-version-to-2.11.sh. (Again, never tried compiling with scala 2.11.) On Thu, Mar 5, 2015 at 4:52 PM, Marcelo Vanzin van...@cloudera.com wrote: I've never tried it, but I'm pretty sure in the very least you want -Pscala-2.11 (not -D). On Thu, Mar 5, 2015

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-04 Thread Marcelo Vanzin
On Wed, Mar 4, 2015 at 10:08 AM, Srini Karri skarri@gmail.com wrote: spark.executor.extraClassPath D:\\Apache\\spark-1.2.1-bin-hadoop2\\spark-1.2.1-bin-hadoop2.4\\bin\\classes spark.eventLog.dir D:/Apache/spark-1.2.1-bin-hadoop2/spark-1.2.1-bin-hadoop2.4/bin/tmp/spark-events

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin
, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin van...@cloudera.com wrote: Seems like someone set up m2.mines.com as a mirror in your pom file or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is in a messed up state). On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote

Re: Issues with maven dependencies for version 1.2.0 but not version 1.1.0

2015-03-04 Thread Marcelo Vanzin
Seems like someone set up m2.mines.com as a mirror in your pom file or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is in a messed up state). On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote: Hi All, I am currently having problem with the maven dependencies for

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-03 Thread Marcelo Vanzin
Spark applications shown in the RM's UI should have an Application Master link when they're running. That takes you to the Spark UI for that application where you can see all the information you're looking for. If you're running a history server and add spark.yarn.historyServer.address to your

Re: ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...

2015-03-03 Thread Marcelo Vanzin
Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that? On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io subscripti...@prismalytics.io wrote: Hi Friends: We noticed the following in 'pyspark' happens when

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
? Thanks a lot for the help -AJ On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com wrote: What are you calling masternode? In yarn-cluster mode, the driver is running somewhere in your cluster, not on the machine where you run spark-submit. The easiest way to get to the Spark UI

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
What are you calling masternode? In yarn-cluster mode, the driver is running somewhere in your cluster, not on the machine where you run spark-submit. The easiest way to get to the Spark UI when using Yarn is to use the Yarn RM's web UI. That will give you a link to the application's UI

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Marcelo Vanzin
.compute.amazonaws.com:9026 shows me all the applications. Do I have to do anything for the port 8088 or whatever I am seeing at 9026 port is good .Attached is screenshot . Thanks AJ On Mon, Mar 2, 2015 at 4:24 PM, Marcelo Vanzin van...@cloudera.com wrote: That's the RM's RPC port, not the web UI port

Re: Is SPARK_CLASSPATH really deprecated?

2015-03-02 Thread Marcelo Vanzin
. -- Kannan On Thu, Feb 26, 2015 at 6:08 PM, Marcelo Vanzin van...@cloudera.com wrote: On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote: Also, I would like to know if there is a localization overhead when we use spark.executor.extraClassPath. Again, in the case

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
(URLClassLoader.java:355) ... On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote: Guava is not in Spark. (Well, long version: it's in Spark but it's relocated to a different package except for some special classes leaked through the public API.) If your app needs

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote: @Marcelo do you mean by modifying spark.executor.extraClassPath on all workers, that didn’t seem to work? That's an app configuration, not a worker configuration, so if you're trying to set it on the worker configuration

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-27 Thread Marcelo Vanzin
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting

Re: Spark excludes fastutil dependencies we need

2015-02-26 Thread Marcelo Vanzin
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: So, should the userClassPathFirst flag work and there is a bug? Sorry for jumping in the middle of conversation (and probably missing some of it), but note that this option applies only to executors. If you're trying to

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin
SPARK_CLASSPATH is definitely deprecated, but my understanding is that spark.executor.extraClassPath is not, so maybe the documentation needs fixing. I'll let someone who might know otherwise comment, though. On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah kra...@maprtech.com wrote:

Re: Error: no snappyjava in java.library.path

2015-02-26 Thread Marcelo Vanzin
Hi Dan, This is a CDH issue, so I'd recommend using cdh-u...@cloudera.org for those questions. This is an issue with fixed in recent CM 5.3 updates; if you're not using CM, or want a workaround, you can manually configure spark.driver.extraLibraryPath and spark.executor.extraLibraryPath to

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Marcelo Vanzin
On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote: Also, I would like to know if there is a localization overhead when we use spark.executor.extraClassPath. Again, in the case of hbase, these jars would be typically available on all nodes. So there is no need to localize

Re: output worker stdout to one place

2015-02-20 Thread Marcelo Vanzin
Hi Anny, You could play with creating your own log4j.properties that will write the output somewhere else (e.g. to some remote mount, or remote syslog). Sorry, but I don't have an example handy. Alternatively, if you can use Yarn, it will collect all logs after the job is finished and make them

Re: issue Running Spark Job on Yarn Cluster

2015-02-19 Thread Marcelo Vanzin
You'll need to look at your application's logs. You can use yarn logs --applicationId [id] to see them. On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh sachin.sha...@gmail.com wrote: Hi, I want to run my spark Job in Hadoop yarn Cluster mode, I am using below command - spark-submit --master

Re: Class loading issue, spark.files.userClassPathFirst doesn't seem to be working

2015-02-18 Thread Marcelo Vanzin
Hello, On Tue, Feb 17, 2015 at 8:53 PM, dgoldenberg dgoldenberg...@gmail.com wrote: I've tried setting spark.files.userClassPathFirst to true in SparkConf in my program, also setting it to true in $SPARK-HOME/conf/spark-defaults.conf as Is the code in question running on the driver or in some

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-10 Thread Marcelo Vanzin
the func1 and func2 from jars that are already cached into local nodes? Thanks, Yitong 2015-02-09 14:35 GMT-08:00 Marcelo Vanzin van...@cloudera.com: `func1` and `func2` never get serialized. They must exist on the other end in the form of a class loaded by the JVM. What gets serialized

Re: Will Spark serialize an entire Object or just the method referred in an object?

2015-02-09 Thread Marcelo Vanzin
`func1` and `func2` never get serialized. They must exist on the other end in the form of a class loaded by the JVM. What gets serialized is an instance of a particular closure (the argument to your map function). That's a separate class. The instance of that class that is serialized contains

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, When you run on Yarn, Yarn's libraries are placed in the classpath, and they have precedence over your app's. So, with Spark 1.2, you'll get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get Guava 14 from Spark, so still a problem for you). Right now, the option Markus

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Koert, On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers ko...@tresata.com wrote: do i understand it correctly that on yarn the the customer jars are truly placed before the yarn and spark jars on classpath? meaning at container construction time, on the same classloader? that would be great

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers ko...@tresata.com wrote: about putting stuff on classpath before spark or yarn... yeah you can shoot yourself in the foot with it, but since the container is isolated it should be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with

Re: “mapreduce.job.user.classpath.first” for Spark

2015-02-04 Thread Marcelo Vanzin
Hi Corey, On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote: Another suggestion is to build Spark by yourself. I'm having trouble seeing what you mean here, Marcelo. Guava is already shaded to a different package for the 1.2.0 release. It shouldn't be causing conflicts.

Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException

2015-01-30 Thread Marcelo Vanzin
Hi Capitão, Since you're using CDH, your question is probably more appropriate for the cdh-u...@cloudera.org list. The problem you're seeing is most probably an artifact of the way CDH is currently packaged. You have to add Hive jars manually to you Spark app's classpath if you want to use the

Re: Spark on Windows 2008 R2 serv er does not work

2015-01-28 Thread Marcelo Vanzin
https://issues.apache.org/jira/browse/SPARK-2356 Take a look through the comments, there are some workarounds listed there. On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Marcelo Vanzin
On Thu, Jan 22, 2015 at 10:21 AM, Sean Owen so...@cloudera.com wrote: I think a Spark site would have a lot less traffic. One annoyance is that people can't figure out when to post on SO vs Data Science vs Cross Validated. Another is that a lot of the discussions we see on the Spark users list

Re: spark java options

2015-01-16 Thread Marcelo Vanzin
Hi Kane, What's the complete command line you're using to submit the app? Where to you expect these options to appear? On Fri, Jan 16, 2015 at 11:12 AM, Kane Kim kane.ist...@gmail.com wrote: I want to add some java options when submitting application: --conf

Re: spark java options

2015-01-16 Thread Marcelo Vanzin
Hi Kane, Here's the command line you sent me privately: ./spark-1.2.0-bin-hadoop2.4/bin/spark-submit --class SimpleApp --conf spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder --master local simpleapp.jar ./test.log You're running the app in local mode. In that

Re: Error when running SparkPi on Secure HA Hadoop cluster

2015-01-15 Thread Marcelo Vanzin
You're specifying the queue in the spark-submit command line: --queue thequeue Are you sure that queue exists? On Thu, Jan 15, 2015 at 11:23 AM, Manoj Samel manojsamelt...@gmail.com wrote: Hi, Setup is as follows Hadoop Cluster 2.3.0 (CDH5.0) - Namenode HA - Resource manager HA -

Re: how to run python app in yarn?

2015-01-14 Thread Marcelo Vanzin
As the error message says... On Wed, Jan 14, 2015 at 3:14 PM, freedafeng freedaf...@yahoo.com wrote: Error: Cluster deploy mode is currently not supported for python applications. Use yarn-client instead of yarn-cluster for pyspark apps. -- Marcelo

Re: /tmp directory fills up

2015-01-12 Thread Marcelo Vanzin
Hi Alessandro, You can look for a log line like this in your driver's output: 15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local directory at /data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d If you're deploying your application

Re: How does unmanaged memory work with the executor memory limits?

2015-01-12 Thread Marcelo Vanzin
Short answer: yes. Take a look at: http://spark.apache.org/docs/latest/running-on-yarn.html Look for memoryOverhead. On Mon, Jan 12, 2015 at 2:06 PM, Michael Albert m_albert...@yahoo.com.invalid wrote: Greetings! My executors apparently are being terminated because they are running beyond

Re: Running spark 1.2 on Hadoop + Kerberos

2015-01-08 Thread Marcelo Vanzin
Hi Manoj, As long as you're logged in (i.e. you've run kinit), everything should just work. You can run klist to make sure you're logged in. On Thu, Jan 8, 2015 at 3:49 PM, Manoj Samel manojsamelt...@gmail.com wrote: Hi, For running spark 1.2 on Hadoop cluster with Kerberos, what spark

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
I ran this with CDH 5.2 without a problem (sorry don't have 5.3 readily available at the moment): $ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*' $ spark-submit --driver-class-path $HBASE --conf spark.executor.extraClassPath=$HBASE --master yarn --class org.apache.spark.examples.HBaseTest

Re: Running spark 1.2 on Hadoop + Kerberos

2015-01-08 Thread Marcelo Vanzin
On Thu, Jan 8, 2015 at 4:09 PM, Manoj Samel manojsamelt...@gmail.com wrote: Some old communication (Oct 14) says Spark is not certified with Kerberos. Can someone comment on this aspect ? Spark standalone doesn't support kerberos. Spark running on top of Yarn works fine with kerberos. --

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
On Thu, Jan 8, 2015 at 3:33 PM, freedafeng freedaf...@yahoo.com wrote: I installed the custom as a standalone mode as normal. The master and slaves started successfully. However, I got error when I ran a job. It seems to me from the error message the some library was compiled against hadoop1,

Re: SparkSQL

2015-01-08 Thread Marcelo Vanzin
Disclaimer: this seems more of a CDH question, I'd suggest sending these to the CDH mailing list in the future. CDH 5.2 actually has Spark 1.1. It comes with SparkSQL built-in, but it does not include the thrift server because of incompatibilities with the CDH version of Hive. To use Hive

Re: correct/best way to install custom spark1.2 on cdh5.3.0?

2015-01-08 Thread Marcelo Vanzin
Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org. But the question I'd like to ask is: why do you need your own Spark build? What's wrong with CDH's Spark that it doesn't work for you? On Thu, Jan 8, 2015 at 3:01 PM, freedafeng freedaf...@yahoo.com wrote: Could anyone come

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin
and the user that runs Spark in our case is a unix ID called mapr (in the mapr group). Therefore, this can't read my job event logs as shown above. Thanks, Michael -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: 07 January 2015 18:10 To: England, Michael (IT/UK

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin
Nevermind my last e-mail. HDFS complains about not understanding 3777... On Thu, Jan 8, 2015 at 9:46 AM, Marcelo Vanzin van...@cloudera.com wrote: Hmm. Can you set the permissions of /apps/spark/historyserver/logs to 3777? I'm not sure HDFS respects the group id bit, but it's worth a try. (BTW

Re: Spark History Server can't read event logs

2015-01-08 Thread Marcelo Vanzin
Sorry for the noise; but I just remembered you're actually using MapR (and not HDFS), so maybe the 3777 trick could work... On Thu, Jan 8, 2015 at 10:32 AM, Marcelo Vanzin van...@cloudera.com wrote: Nevermind my last e-mail. HDFS complains about not understanding 3777... On Thu, Jan 8, 2015

Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException

2015-01-08 Thread Marcelo Vanzin
Just to add to Sandy's comment, check your client configuration (generally in /etc/spark/conf). If you're using CM, you may need to run the Deploy Client Configuration command on the cluster to update the configs to match the new version of CDH. On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza

Re: spark 1.1 got error when working with cdh5.3.0 standalone mode

2015-01-07 Thread Marcelo Vanzin
This could be cause by many things including wrong configuration. Hard to tell with just the info you provided. Is there any reason why you want to use your own Spark instead of the one shipped with CDH? CDH 5.3 has Spark 1.2, so unless you really need to run Spark 1.1, you should be better off

Re: spark-network-yarn 2.11 depends on spark-network-shuffle 2.10

2015-01-07 Thread Marcelo Vanzin
This particular case shouldn't cause problems since both of those libraries are java-only (the scala version appended there is just for helping the build scripts). But it does look weird, so it would be nice to fix it. On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com

Re: Spark History Server can't read event logs

2015-01-07 Thread Marcelo Vanzin
The Spark code generates the log directory with 770 permissions. On top of that you need to make sure of two things: - all directories up to /apps/spark/historyserver/logs/ are readable by the user running the history server - the user running the history server belongs to the group that owns

Re: different akka versions and spark

2015-01-05 Thread Marcelo Vanzin
Spark doesn't really shade akka; it pulls a different build (kept under the org.spark-project.akka group and, I assume, with some build-time differences from upstream akka?), but all classes are still in the original location. The upgrade is a little more unfortunate than just changing akka,

Re: Who manage the log4j appender while running spark on yarn?

2014-12-22 Thread Marcelo Vanzin
If you don't specify your own log4j.properties, Spark will load the default one (from core/src/main/resources/org/apache/spark/log4j-defaults.properties, which ends up being packaged with the Spark assembly). You can easily override the config file if you want to, though; check the Debugging

Re: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Marcelo Vanzin
On Fri, Dec 19, 2014 at 4:05 PM, Haopu Wang hw...@qilinsoft.com wrote: My application doesn’t depends on hadoop-client directly. It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4. This can be checked by Maven repository at

Re: Yarn not running as many executors as I'd like

2014-12-19 Thread Marcelo Vanzin
How many cores / memory do you have available per NodeManager, and how many cores / memory are you requesting for your job? Remember that in Yarn mode, Spark launches num executors + 1 containers. The extra container, by default, reserves 1 core and about 1g of memory (more if running in cluster

Re: SPARK-2243 Support multiple SparkContexts in the same JVM

2014-12-17 Thread Marcelo Vanzin
Hi Anton, That could solve some of the issues (I've played with that a little bit). But there are still some areas where this would be sub-optimal, because Spark still uses system properties in some places and those are global, not per-class loader. (SparkSubmit is the biggest offender here, but

Re: Spark Server - How to implement

2014-12-11 Thread Marcelo Vanzin
it as a public API, but mostly for internal Hive use. It can give you a few ideas, though. Also, SPARK-3215. On Thu, Dec 11, 2014 at 5:41 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Manoj, I'm not aware of any public projects that do something like that, except for the Ooyala server which you say

Re: Spark Server - How to implement

2014-12-11 Thread Marcelo Vanzin
Hi Manoj, I'm not aware of any public projects that do something like that, except for the Ooyala server which you say doesn't cover your needs. We've been playing with something like that inside Hive, though: On Thu, Dec 11, 2014 at 5:33 PM, Manoj Samel manojsamelt...@gmail.com wrote: Hi,

Re: Spark 1.0.0 Standalone mode config

2014-12-10 Thread Marcelo Vanzin
Hello, What do you mean by app that uses 2 cores and 8G of RAM? Spark apps generally involve multiple processes. The command line options you used affect only one of them (the driver). You may want to take a look at similar configuration for executors. Also, check the documentation:

Re: spark shell and hive context problem

2014-12-09 Thread Marcelo Vanzin
Hello, In CDH 5.2 you need to manually add Hive classes to the classpath of your Spark job if you want to use the Hive integration. Also, be aware that since Spark 1.1 doesn't really support the version of Hive shipped with CDH 5.2, this combination is to be considered extremely experimental. On

Re: How to incrementally compile spark examples using mvn

2014-12-05 Thread Marcelo Vanzin
wrote: Thank you, Marcelo and Sean, mvn install is a good answer for my demands. -邮件原件- 发件人: Marcelo Vanzin [mailto:van...@cloudera.com] 发送时间: 2014年11月21日 1:47 收件人: yiming zhang 抄送: Sean Owen; user@spark.apache.org 主题: Re: How to incrementally compile spark examples using mvn Hi

Re: How to incrementally compile spark examples using mvn

2014-12-05 Thread Marcelo Vanzin
and got weird errors because some toy version i once build was stuck in my local maven repo and it somehow got priority over a real maven repo). On Fri, Dec 5, 2014 at 5:28 PM, Marcelo Vanzin van...@cloudera.com wrote: You can set SPARK_PREPEND_CLASSES=1 and it should pick your new mllib

Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 11:22 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Any suggestion on how can user with custom Hadoop jar solve this issue? You'll need to include all the dependencies for that custom Hadoop jar to the classpath. Those will include Guava (which is not included in

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
Hello, On Mon, Nov 24, 2014 at 12:07 PM, aecc alessandroa...@gmail.com wrote: This is the stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA - field (class

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
On Mon, Nov 24, 2014 at 1:56 PM, aecc alessandroa...@gmail.com wrote: I checked sqlContext, they use it in the same way I would like to use my class, they make the class Serializable with transient. Does this affects somehow the whole pipeline of data moving? I mean, will I get performance

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
That's an interesting question for which I do not know the answer. Probably a question for someone with more knowledge of the internals of the shell interpreter... On Mon, Nov 24, 2014 at 2:19 PM, aecc alessandroa...@gmail.com wrote: Ok, great, I'm gonna do do it that way, thanks :). However I

Re: Is there a way to turn on spark eventLog on the worker node?

2014-11-24 Thread Marcelo Vanzin
Hello, What exactly are you trying to see? Workers don't generate any events that would be logged by enabling that config option. Workers generate logs, and those are captured and saved to disk by the cluster manager, generally, without you having to do anything. On Mon, Nov 24, 2014 at 7:46 PM,

Re: How to incrementally compile spark examples using mvn

2014-11-20 Thread Marcelo Vanzin
Hi Yiming, On Wed, Nov 19, 2014 at 5:35 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Thank you for your reply. I was wondering whether there is a method of reusing locally-built components without installing them? That is, if I have successfully built the spark project as a whole, how

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin
Check the --files argument in the output spark-submit -h. On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell matt.narr...@gmail.com wrote: How do I configure the files to be uploaded to YARN containers. So far, I’ve only seen --conf spark.yarn.jar=hdfs://….” which allows me to specify the HDFS

Re: spark-submit and logging

2014-11-20 Thread Marcelo Vanzin
Hi Tobias, With the current Yarn code, packaging the configuration in your app's jar and adding the -Dlog4j.configuration=log4jConf.xml argument to the extraJavaOptions configs should work. That's not the recommended way for get it to work, though, since this behavior may change in the future.

Re: spark-shell giving me error of unread block data

2014-11-19 Thread Marcelo Vanzin
Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing

Re: spark-shell giving me error of unread block data

2014-11-19 Thread Marcelo Vanzin
are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote

Re: Spark on YARN

2014-11-18 Thread Marcelo Vanzin
Can you check in your RM's web UI how much of each resource does Yarn think you have available? You can also check that in the Yarn configuration directly. Perhaps it's not configured to use all of the available resources. (If it was set up with Cloudera Manager, CM will reserve some room for

Re: How to incrementally compile spark examples using mvn

2014-11-15 Thread Marcelo Vanzin
I haven't tried scala:cc, but you can ask maven to just build a particular sub-project. For example: mvn -pl :spark-examples_2.10 compile On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I have already successfully compile and run spark examples. My problem

Re: Backporting spark 1.1.0 to CDH 5.1.3

2014-11-10 Thread Marcelo Vanzin
Hello, CDH 5.1.3 ships with a version of Hive that's not entirely the same as the Hive Spark 1.1 supports. So when building your custom Spark, you should make sure you change all the dependency versions to point to the CDH versions. IIRC Spark depends on org.spark-project.hive:0.12.0, you'd have

Re: How to avoid use snappy compression when saveAsSequenceFile?

2014-11-05 Thread Marcelo Vanzin
On Mon, Oct 27, 2014 at 7:37 PM, buring qyqb...@gmail.com wrote: Here is error log,I abstract as follows: INFO [binaryTest---main]: before first WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):

Re: SparkContext.stop() ?

2014-10-31 Thread Marcelo Vanzin
Actually, if you don't call SparkContext.stop(), the event log information that is used by the history server will be incomplete, and your application will never show up in the history server's UI. If you don't use that functionality, then you're probably ok not calling it as long as your

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Marcelo Vanzin
resource or 2) add dynamic resource management for Yarn mode is very much wanted. Jianshi On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: That's not something you might want to do usually

Re: JavaHiveContext class not found error. Help!!

2014-10-23 Thread Marcelo Vanzin
Hello there, This is more of a question for the cdh-users list, but in any case... In CDH 5.1 we skipped packaging of the Hive module in SparkSQL. That has been fixed in CDH 5.2, so if it's possible for you I'd recommend upgrading. On Thu, Oct 23, 2014 at 2:53 PM, nitinkak001

Re: Exceptions not caught?

2014-10-23 Thread Marcelo Vanzin
On Thu, Oct 23, 2014 at 3:40 PM, ankits ankitso...@gmail.com wrote: 2014-10-23 15:39:50,845 ERROR [] Exception in task 1.0 in stage 1.0 (TID 1) java.io.IOException: org.apache.thrift.protocol.TProtocolException: This looks like an exception that's happening on an executor and just being

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
Hi Ashwin, Let me try to answer to the best of my knowledge. On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar ashwinshanka...@gmail.com wrote: Here are my questions : 1. Sharing spark context : How exactly multiple users can share the cluster using same spark context ? That's not

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Marcelo Vanzin
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar ashwinshanka...@gmail.com wrote: That's not something you might want to do usually. In general, a SparkContext maps to a user application My question was basically this. In this page in the official doc, under Scheduling within an application

Re: how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread Marcelo Vanzin
On top of what Andrew said, you shouldn't need to manually add the mllib jar to your jobs; it's already included in the Spark assembly jar. On Thu, Oct 16, 2014 at 11:51 PM, eric wong win19...@gmail.com wrote: Hi, i using the comma separated style for submit multiple jar files in the follow

Re: Spark assembly for YARN/CDH5

2014-10-16 Thread Marcelo Vanzin
Hi Philip, The assemblies are part of the CDH distribution. You can get them here: http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html As of Spark 1.1 (and, thus, CDH 5.2), assemblies are not published to maven repositories anymore (you can see commit [1] for details). [1]

Re: spark-sql not coming up with Hive 0.10.0/CDH 4.6

2014-10-15 Thread Marcelo Vanzin
Hi Anurag, Spark SQL (from the Spark standard distribution / sources) currently requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not gonna work. CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can talk to the Hive 0.13.1 that is also bundled with CDH, so if

Re: SPARK_SUBMIT_CLASSPATH question

2014-10-15 Thread Marcelo Vanzin
Hi Greg, I'm not sure exactly what it is that you're trying to achieve, but I'm pretty sure those variables are not supposed to be set by users. You should take a look at the documentation for spark.driver.extraClassPath and spark.driver.extraLibraryPath, and the equivalent options for executors.

Re: how to set log level of spark executor on YARN(using yarn-cluster mode)

2014-10-15 Thread Marcelo Vanzin
Hi Eric, Check the Debugging Your Application section at: http://spark.apache.org/docs/latest/running-on-yarn.html Long story short: upload your log4j.properties using the --files argument of spark-submit. (Mental note: we could make the log level configurable via a system property...) On

Re: Application details for failed and teminated jobs

2014-10-02 Thread Marcelo Vanzin
You may want to take a look at this PR: https://github.com/apache/spark/pull/1558 Long story short: while not a terrible idea to show running applications, your particular case should be solved differently. Applications are responsible for calling SparkContext.stop() at the end of their run,

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Marcelo Vanzin
You can't set up the driver memory programatically in client mode. In that mode, the same JVM is running the driver, so you can't modify command line options anymore when initializing the SparkContext. (And you can't really start cluster mode apps that way, so the only way to set this is through

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Marcelo Vanzin
in a few different contexts, but I don't think there's an official solution yet.) On Wed, Oct 1, 2014 at 9:59 AM, Tamas Jambor jambo...@gmail.com wrote: thanks Marcelo. What's the reason it is not possible in cluster mode, either? On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com

Re: spark.driver.memory is not set (pyspark, 1.1.0)

2014-10-01 Thread Marcelo Vanzin
No, you can't instantiate a SparkContext to start apps in cluster mode. For Yarn, for example, you'd have to call directly into org.apache.spark.deploy.yarn.Client; that class will tell the Yarn cluster to launch the driver for you and then instantiate the SparkContext. On Wed, Oct 1, 2014 at

<    1   2   3   4   5   >