Re: error when running spark from oozie launcher

2016-08-18 Thread Jean-Baptiste Onofré
It sounds like a mismatch in the spark version ship in oozie and the runtime 
one.

Regards
JB



On Aug 18, 2016, 07:36, at 07:36, tkg_cangkul  wrote:
>hi olivier, thx for your reply.
>
>this is the full stacktrace :
>
>Failing Oozie Launcher, Main class 
>[org.apache.oozie.action.hadoop.SparkMain], main() threw exception, 
>org.apache.spark.launcher.CommandBuilderUtils.addPermGenSizeOpt(Ljava/util/List;)V
>java.lang.NoSuchMethodError: 
>org.apache.spark.launcher.CommandBuilderUtils.addPermGenSizeOpt(Ljava/util/List;)V
>
>On 18/08/16 13:28, Olivier Girardot wrote:
>> this is not the full stacktrace, please post the full stacktrace if 
>> you want some help
>>
>>
>>
>> On Wed, Aug 17, 2016 7:24 PM, tkg_cangkul yuza.ras...@gmail.com 
>>  wrote:
>>
>> hi i try to submit job spark with oozie. but i've got one problem
>> here.
>> when i submit the same job. sometimes my job succeed but
>sometimes
>> my job was failed.
>>
>> i've got this error message when the job was failed :
>>
>>
>org.apache.spark.launcher.CommandBuilderUtils.addPermGenSizeOpt(Ljava/util/List;)V
>>
>> anyone can help me to solve this? i've try to set
>> -XX:MaxPermSize=512m  -XX:PermSize=256m in
>> spark.driver.extraJavaOptions properties but this not help enough
>> for me. 
>>
>>
>>
>> **Olivier Girardot| Associé
>> o.girar...@lateral-thoughts.com
>
>> +33 6 24 09 17 94


Re: Aggregations with scala pairs

2016-08-18 Thread Jean-Baptiste Onofré
Agreed.

Regards
JB



On Aug 18, 2016, 07:32, at 07:32, Olivier Girardot 
 wrote:
>CC'ing dev list, you should open a Jira and a PR related to it to
>discuss it c.f.
>https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges
>
>
>
>
>
>On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote:
>Hello, I'd like to report a wrong behavior of DataSet's API, I don´t
>know how I
>can do that. My Jira account doesn't allow me to add a Issue
>I'm using Apache 2.0.0 but the problem came since at least version 1.4
>(given
>the doc since 1.3)
>The problem is simple to reporduce, also the work arround, if we apply
>agg over
>a DataSet with scala pairs over the same column, only one agg over that
>column
>is actualy used, this is because the toMap that reduce the pair values
>of the
>mane key to one and overwriting the value
>class 
>https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
>
>
>def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>DataFrame = {
>agg((aggExpr +: aggExprs).toMap)
>}
>rewrited as somthing like this should work def agg(aggExpr: (String,
>String), aggExprs: (String, String)*): DataFrame = {
>toDF((aggExpr +: aggExprs).map { pairExpr =>
>strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) }
>
>regards --
>Ing. Ivaldi Andres
>
>
>Olivier Girardot | Associé
>o.girar...@lateral-thoughts.com
>+33 6 24 09 17 94


Re: Spark Streaming UI duration numbers mismatch

2016-03-23 Thread Jean-Baptiste Onofré

Hi Jatin,

I will reproduce tomorrow and take a look.

Did you already create a Jira about that (I don't think so) ? If I 
reproduce the problem (and it's really a problem), then I will create 
one for you.


Thanks,
Regards
JB

On 03/23/2016 08:20 PM, Jatin Kumar wrote:

Hello,

Can someone please provide some help on the below issue?

--
Thanks
Jatin

On Tue, Mar 22, 2016 at 3:30 PM, Jatin Kumar <jku...@rocketfuelinc.com
<mailto:jku...@rocketfuelinc.com>> wrote:

Hello all,

I am running spark streaming application and the duration numbers on
batch page and job page don't match. Please find attached
screenshots of the same.

IMO processing time on batch page at top should be sum of durations
of all jobs and similarly the duration of a job reported on batch
page should be sum of durations of stages of that job.

--
Thanks
Jatin




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Apache Beam Spark runner

2016-03-19 Thread Jean-Baptiste Onofré

Hi Amit,

well done ;)

I'm reviewing it now (as I didn't have to do it before, sorry about that).

Regards
JB

On 03/17/2016 06:26 PM, Sela, Amit wrote:

Hi all,

The Apache Beam Spark runner is now available at:
https://github.com/apache/incubator-beam/tree/master/runners/spark Check
it out!
The Apache Beam (http://beam.incubator.apache.org/) project is a unified
model for building data pipelines using Google’s Dataflow programming
model, and now it supports Spark as well!

Take it for a ride on your Spark cluster!

Thanks,
Amit




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Jean-Baptiste Onofré

Hi Rachana,

don't you have two messages on the kafka broker ?

Regards
JB

On 01/05/2016 05:14 PM, Rachana Srivastava wrote:

I have a very simple two lines program.  I am getting input from Kafka
and save the input in a file and counting the input received.  My code
looks like this, when I run this code I am getting two accumulator count
for each input.

HashMap<String, String> kafkaParams= *new*HashMap<String,
String>();kafkaParams.put("metadata.broker.list", "localhost:9092");
kafkaParams.put("zookeeper.connect", "localhost:2181");

JavaPairInputDStream<String, String> messages=
KafkaUtils./createDirectStream/( jssc,String.*class*, String.*class*,
StringDecoder.*class*, StringDecoder.*class*, kafkaParams, topicsSet);

*final**Accumulator **accum**=
**jssc**.sparkContext().accumulator(0);***

JavaDStream lines= messages.map(

*new*_Function<Tuple2<String, String>, String>()_ {

*public*String call(Tuple2<String, String> tuple2) { *accum.add(1);*
*return*tuple2._2();

}});

lines.foreachRDD(*new*_Function<JavaRDD, Void>()_ {

*public*Void call(JavaRDD rdd) *throws*Exception {

*if*(!rdd.isEmpty() ||
!rdd.partitions().isEmpty()){rdd.saveAsTextFile("hdfs://quickstart.cloudera:8020/user/cloudera/testDirJan4/test1.text");}

System.*/out/*.println(" &&&&&&&&&&&&&&&&&&&&& COUNT OF ACCUMULATOR IS
"+ *accum.value(*)); *return**null*;}

  });

jssc.start();

If I comment rdd.saveAsTextFile I get correct count, but with
rdd.saveAsTextFile for each input I am getting multiple accumulator count.

Thanks,

Rachana



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is Spark 1.6 released?

2016-01-04 Thread Jean-Baptiste Onofré

Hi Jung,

yes Spark 1.6.0 has been released December, 28th.

The artifacts are on Maven Central:

http://repo1.maven.org/maven2/org/apache/spark/

However, the distribution is not available on dist.apache.org:

https://dist.apache.org/repos/dist/release/spark/

Let me check with the team to upload the distribution to dist.apache.org.

Regards
JB

On 01/04/2016 10:06 AM, Jung wrote:

Hi
There were Spark 1.6 jars in maven central and github.
I found it 5 days ago. But it doesn't appear on Spark website now.
May I regard Spark 1.6 zip file in github as a stable release?

Thanks
Jung



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is Spark 1.6 released?

2016-01-04 Thread Jean-Baptiste Onofré

It's now OK: Michael published and announced the release.

Sorry for the delay.

Regards
JB

On 01/04/2016 10:06 AM, Jung wrote:

Hi
There were Spark 1.6 jars in maven central and github.
I found it 5 days ago. But it doesn't appear on Spark website now.
May I regard Spark 1.6 zip file in github as a stable release?

Thanks
Jung



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Download Problem with Spark 1.5.2 pre-built for Hadoop 1.X

2015-12-17 Thread Jean-Baptiste Onofré

Hi,

we have a Jira about that (let me find it): by default, a suffix is 
appended causing issue to resolve the artifact.


Let me find the Jira and the workaround.

Regards
JB

On 12/17/2015 12:48 PM, abc123 wrote:

Get error message when I try to download Spark 1.5.2 pre-built for Hadoop
1.X. Can someone help me please?

Error:
http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop1.tgz

NoSuchKey
The specified key does not exist.
spark-1.5.2-bin-hadoop1.tgz
CEA88FA320236296pWBY80tIxCQw5el9YdjTKta2aKAvIuKvo51RpnaU7YtxMxu37aki32yAWWnf2Qb8Lu+2zzI4msM=




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Download-Problem-with-Spark-1-5-2-pre-built-for-Hadoop-1-X-tp25726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Compiling ERROR for Spark MetricsSystem

2015-12-11 Thread Jean-Baptiste Onofré
la:431) at
scala.collection.Iterator$class.foreach(Iterator.scala:727) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at
scala.tools.nsc.Global$GlobalPhase.run(Global.scala:431) at
scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at
scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at
scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at
scala.tools.nsc.Global$Run.compile(Global.scala:1662) at
xsbt.CachedCompiler0.run(CompilerInterface.scala:115) at
xsbt.CachedCompiler0.run(CompilerInterface.scala:94) at
xsbt.CompilerInterface.run(CompilerInterface.scala:22) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101) at
sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47) at
sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) at
org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
at
org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
at
org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)

The IDE I'm using is Jetbrains IntelliJIDea 15 CE.

I'm suspecting it's classpath issue but not sure how to further debug.

Thanks Haijia



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Submit - java.lang.IllegalArgumentException: requirement failed

2015-12-11 Thread Jean-Baptiste Onofré

Hi Nick,

the localizedPath has to be not null, that's why the requirement fails.

In the SparkConf used by the spark-submit (default in 
conf/spark-default.conf), do you have all properties defined, especially 
spark.yarn.keytab ?


Thanks,
Regards
JB

On 12/11/2015 05:49 PM, Afshartous, Nick wrote:


Hi,


I'm trying to run a streaming job on a single node EMR 4.1/Spark 1.5
cluster.  Its throwing an IllegalArgumentException right away on the submit.

Attaching full output from console.


Thanks for any insights.

--

 Nick



15/12/11 16:44:43 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/12/11 16:44:43 INFO client.RMProxy: Connecting to ResourceManager at
ip-10-247-129-50.ec2.internal/10.247.129.50:8032
15/12/11 16:44:43 INFO yarn.Client: Requesting a new application from
cluster with 1 NodeManagers
15/12/11 16:44:43 INFO yarn.Client: Verifying our application has not
requested more than the maximum memory capability of the cluster (54272
MB per container)
15/12/11 16:44:43 INFO yarn.Client: Will allocate AM container, with
11264 MB memory including 1024 MB overhead
15/12/11 16:44:43 INFO yarn.Client: Setting up container launch context
for our AM
15/12/11 16:44:43 INFO yarn.Client: Setting up the launch environment
for our AM container
15/12/11 16:44:43 INFO yarn.Client: Preparing resources for our AM container
15/12/11 16:44:44 INFO yarn.Client: Uploading resource
file:/usr/lib/spark/lib/spark-assembly-1.5.0-hadoop2.6.0-amzn-1.jar ->
hdfs://ip-10-247-129-50.ec2.internal:8020/user/hadoop/.sparkStaging/application_1447\
442727308_0126/spark-assembly-1.5.0-hadoop2.6.0-amzn-1.jar
15/12/11 16:44:44 INFO metrics.MetricsSaver: MetricsConfigRecord
disabledInCluster: false instanceEngineCycleSec: 60
clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072
maxInstanceCount: 500\
  lastModified: 1447442734295
15/12/11 16:44:44 INFO metrics.MetricsSaver: Created MetricsSaver
j-2H3BTA60FGUYO:i-f7812947:SparkSubmit:15603 period:60
/mnt/var/em/raw/i-f7812947_20151211_SparkSubmit_15603_raw.bin
15/12/11 16:44:45 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay
1276 raw values into 1 aggregated values, total 1
15/12/11 16:44:45 INFO yarn.Client: Uploading resource
file:/home/hadoop/spark-pipeline-framework-1.1.6-SNAPSHOT/workflow/lib/spark-kafka-services-1.0.jar
-> hdfs://ip-10-247-129-50.ec2.internal:8020/user/hadoo\
p/.sparkStaging/application_1447442727308_0126/spark-kafka-services-1.0.jar
15/12/11 16:44:45 INFO yarn.Client: Uploading resource
file:/home/hadoop/spark-pipeline-framework-1.1.6-SNAPSHOT/conf/AwsCredentials.properties
-> hdfs://ip-10-247-129-50.ec2.internal:8020/user/hadoop/.sparkSta\
ging/application_1447442727308_0126/AwsCredentials.properties
15/12/11 16:44:45 WARN yarn.Client: Resource
file:/home/hadoop/spark-pipeline-framework-1.1.6-SNAPSHOT/conf/AwsCredentials.properties
added multiple times to distributed cache.
15/12/11 16:44:45 INFO yarn.Client: Deleting staging directory
.sparkStaging/application_1447442727308_0126
Exception in thread "main" java.lang.IllegalArgumentException:
requirement failed
 at scala.Predef$.require(Predef.scala:221)
 at
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
 at
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:390)
 at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:390)
 at
org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:388)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:388)
 at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
 at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
 at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)
 at org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
 at org.apache.spark.deploy.yarn.Client.main(Client.scala)




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Jean-Baptiste Onofré

Hi Nick,

Just to be sure: don't you see some ClassCastException in the log ?

Thanks,
Regards
JB

On 12/10/2015 07:56 PM, Nick Pentreath wrote:

Could you provide an example / test case and more detail on what issue
you're facing?

I've just tested a simple program reading from a dev Kinesis stream and
using stream.print() to show the records, and it works under 1.5.1 but
doesn't appear to be working under 1.5.2.

UI for 1.5.2:

Inline image 1

UI for 1.5.1:

Inline image 2

On Thu, Dec 10, 2015 at 5:50 PM, Brian London <brianmlon...@gmail.com
<mailto:brianmlon...@gmail.com>> wrote:

Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
Kinesis ASL that ships with 1.5.2 appears to not work for me
although 1.5.1 is fine. I spent some time with Amazon earlier in the
week and the only thing we could do to make it work is to change the
version to 1.5.1.  Can someone please attempt to reproduce before I
open a JIRA issue for it?




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to test https://issues.apache.org/jira/browse/SPARK-10648 fix

2015-12-03 Thread Jean-Baptiste Onofré

Hi Rajesh,

you can check codebase and build yourself in order to test:

git clone https://git-wip-us.apache.org/repos/asf/spark
cd spark
mvn clean package -DskipTests

You will have bin, sbin and conf folders to try it.

Regards
JB

On 12/03/2015 09:39 AM, Madabhattula Rajesh Kumar wrote:

Hi Team,

Looks like this issue is fixed in 1.6 release. How to test this fix? Is
any jar is available? So I can add that jar in dependency and test this
fix. (Or) Any other way, I can test this fix in 1.15.2 code base.

Could you please let me know the steps. Thank you for your support

Regards,
Rajesh


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How and where to update release notes for spark rel 1.6?

2015-12-03 Thread Jean-Baptiste Onofré

Hi Ravi,

Even if it's not perfect, you can take a look on the current 
ReleaseNotes on Jira:


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12333083

Regards
JB

On 12/03/2015 12:01 PM, RaviShankar KS wrote:

Hi,

How and where to update release notes for spark rel 1.6?
pls help.

There are a few methods with changed params, and a few deprecated ones
that need to be documented.

Thanks
Ravi


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Standalone cluster not using multiple workers for single application

2015-11-02 Thread Jean-Baptiste Onofré

Hi Jeff,

it may depend of your application code.

To verify your setup and if your are able to scale on multiple worker, 
you can try using the SparkTC example for instance (it should use all 
workers).


Regards
JB

On 11/02/2015 08:56 PM, Jeff Jones wrote:

I’ve got an a series of applications using a single standalone Spark
cluster (v1.4.1).  The cluster has 1 master and 4 workers (4 CPUs per
worker node). I am using the start-slave.sh script to launch the worker
process on each node and I can see the nodes were successfully
registered using the SparkUI.  When I launch one of my applications
regardless of what I set spark.cores.max to when instantiating the
SparkContext in the driver app I seem to get a single worker assigned to
the application and all jobs that get run.  For example, if I set
spark.cores.max to 16 the SparkUI will show a single worker take the
load with 4 (16 Used) in the Cores column.  How do I get my jobs run
across multiple nodes in the cluster?

Here’s a snippet from the SparkUI (IP addresses removed for privacy)


Workers

Worker Id   Address State   Cores   Memory
worker-20150920064814-***-33659 ***:33659   ALIVE   4 (0 Used)  
28.0 GB
(0.0 B Used)
worker-20151012175609-***37399  ***:37399   ALIVE   4 (16 Used) 28.0 GB
(28.0 GB Used)
worker-20151012181934-***-36573 ***:36573   ALIVE   4 (4 Used)  
28.0 GB
(28.0 GB Used)
worker-20151030170514-***-45368 ***:45368   ALIVE   4 (0 Used)  
28.0 GB
(0.0 B Used)


Running Applications

Application ID  NameCores   Memory per Node Submitted Time  User
State   Duration
app-20151102194733-0278 App116  28.0 GB 2015/11/02 
19:47:33 ***
RUNNING 2 s
app-20151102164156-0274 App24   28.0 GB 2015/11/02 
16:41:56 ***
RUNNING 3.1 h

Jeff


This message (and any attachments) is intended only for the designated
recipient(s). It
may contain confidential or proprietary information, or have other
limitations on use as
indicated by the sender. If you are not a designated recipient, you may
not review, use,
copy or distribute this message. If you received this in error, please
notify the sender by
reply e-mail and delete this message.


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Jean-Baptiste Onofré

Just to be sure: you use yarn cluster (not standalone), right ?

Regards
JB

On 11/02/2015 10:37 AM, Balachandar R.A. wrote:

Yes. In two different places I use spark://

1. In my code, while creating spark configuration, I use the code below

val sConf = new
SparkConf().setAppName("Dummy").setMaster("spark://:7077")
val sConf = val sc = new SparkContext(sConf)


2. I run the job using the command below

spark-submit  --class org.myjob  --jars myjob.jar spark://:7077
myjob.jar

regards
Bala


On 2 November 2015 at 14:59, Romi Kuntsman <r...@totango.com
<mailto:r...@totango.com>> wrote:

except "spark.master", do you have "spark://" anywhere in your code
or config files?

*Romi Kuntsman*, /Big Data Engineer/_
_
http://www.totango.com <http://www.totango.com/>

On Mon, Nov 2, 2015 at 11:27 AM, Balachandar R.A.
<balachandar...@gmail.com <mailto:balachandar...@gmail.com>> wrote:


-- Forwarded message --
From: "Balachandar R.A." <balachandar...@gmail.com
<mailto:balachandar...@gmail.com>>
Date: 02-Nov-2015 12:53 pm
Subject: Re: Error : - No filesystem for scheme: spark
To: "Jean-Baptiste Onofré" <j...@nanthrax.net
<mailto:j...@nanthrax.net>>
Cc:

 > HI JB,
 > Thanks for the response,
 > Here is the content of my spark-defaults.conf
 >
 >
 > # Default system properties included when running spark-submit.
 > # This is useful for setting default environmental settings.
 >
 > # Example:
 >  spark.master spark://fdoat:7077
 > # spark.eventLog.enabled   true
 >  spark.eventLog.dir/home/bala/spark-logs
 > # spark.eventLog.dir   hdfs://namenode:8021/directory
 > # spark.serializer
org.apache.spark.serializer.KryoSerializer
 > # spark.driver.memory  5g
 > # spark.executor.extraJavaOptions  -XX:+PrintGCDetails
    -Dkey=value -Dnumbers="one two three"
 >
 >
 > regards
 > Bala


 >
 > On 2 November 2015 at 12:21, Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
 >>
 >> Hi,
 >>
 >> do you have something special in conf/spark-defaults.conf
(especially on the eventLog directory) ?
 >>
 >> Regards
 >> JB
 >>
 >>
 >> On 11/02/2015 07:48 AM, Balachandar R.A. wrote:
 >>>
 >>> Can someone tell me at what point this error could come?
 >>>
 >>> In one of my use cases, I am trying to use hadoop custom
input format.
 >>> Here is my code.
 >>>
 >>> |valhConf:Configuration=sc.hadoopConfiguration
 >>>

hConf.set("fs.hdfs.impl",classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)hConf.set("fs.file.impl",classOf[org.apache.hadoop.fs.LocalFileSystem].getName)varjob
 >>>

=newJob(hConf)FileInputFormat.setInputPaths(job,newPath("hdfs:///user/bala/MyBinaryFile"));varhRDD
 >>>

=newNewHadoopRDD(sc,classOf[RandomAccessInputFormat],classOf[IntWritable],classOf[BytesWritable],job.getConfiguration())valcount
 >>>
=hRDD.mapPartitionsWithInputSplit{(split,iter)=>myfuncPart(split,iter)}|
 >>>
 >>> |The moment I invoke mapPartitionsWithInputSplit() method,
I get the
 >>> below error in my spark-submit launch|
 >>>
 >>> |
 >>> |
 >>>
 >>> |15/10/3011:11:39WARN scheduler.TaskSetManager:Losttask
0.0in stage
 >>> 0.0(TID
0,40.221.94.235):java.io.IOException:NoFileSystemforscheme:spark
 >>> at
 >>>
    
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)at
 >>>
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)at
 >>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)|
 >>>
 >>> Any help here to move towards fixing this will be of great help
 >>>
 >>>
 >>>
 >>> Thanks
 >>>
 >>> Bala
     >>>
 >>
 >> --
 >> Jean-Baptiste Onof

Re: Error : - No filesystem for scheme: spark

2015-11-02 Thread Jean-Baptiste Onofré

Ah ok. Good catch ;)

Regards
JB

On 11/02/2015 11:51 AM, Balachandar R.A. wrote:

I made a stupid mistake it seems. I supplied the --master option to the
spark url in my launch command. And this error is gone.

Thanks for pointing out possible places for troubleshooting

Regards
Bala

On 02-Nov-2015 3:15 pm, "Balachandar R.A." <balachandar...@gmail.com
<mailto:balachandar...@gmail.com>> wrote:

No.. I am not using yarn. Yarn is not running in my cluster. So, it
is standalone one.

Regards
Bala

On 02-Nov-2015 3:11 pm, "Jean-Baptiste Onofré" <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

Just to be sure: you use yarn cluster (not standalone), right ?

Regards
JB

On 11/02/2015 10:37 AM, Balachandar R.A. wrote:

Yes. In two different places I use spark://

1. In my code, while creating spark configuration, I use the
code below

val sConf = new
SparkConf().setAppName("Dummy").setMaster("spark://:7077")
val sConf = val sc = new SparkContext(sConf)


2. I run the job using the command below

spark-submit  --class org.myjob  --jars myjob.jar
spark://:7077
myjob.jar

regards
Bala


On 2 November 2015 at 14:59, Romi Kuntsman <r...@totango.com
<mailto:r...@totango.com>
<mailto:r...@totango.com <mailto:r...@totango.com>>> wrote:

 except "spark.master", do you have "spark://" anywhere
in your code
 or config files?

 *Romi Kuntsman*, /Big Data Engineer/_
 _
http://www.totango.com <http://www.totango.com/>

 On Mon, Nov 2, 2015 at 11:27 AM, Balachandar R.A.
 <balachandar...@gmail.com
<mailto:balachandar...@gmail.com>
<mailto:balachandar...@gmail.com
<mailto:balachandar...@gmail.com>>> wrote:


 -- Forwarded message --
 From: "Balachandar R.A." <balachandar...@gmail.com
<mailto:balachandar...@gmail.com>
 <mailto:balachandar...@gmail.com
<mailto:balachandar...@gmail.com>>>
 Date: 02-Nov-2015 12:53 pm
 Subject: Re: Error : - No filesystem for scheme: spark
 To: "Jean-Baptiste Onofré" <j...@nanthrax.net
<mailto:j...@nanthrax.net>
 <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>
 Cc:

  > HI JB,
  > Thanks for the response,
  > Here is the content of my spark-defaults.conf
  >
  >
  > # Default system properties included when
running spark-submit.
  > # This is useful for setting default
environmental settings.
  >
  > # Example:
  >  spark.master spark://fdoat:7077
  > # spark.eventLog.enabled   true
  >  spark.eventLog.dir
/home/bala/spark-logs
  > # spark.eventLog.dir
  hdfs://namenode:8021/directory
  > # spark.serializer
 org.apache.spark.serializer.KryoSerializer
  > # spark.driver.memory  5g
  > # spark.executor.extraJavaOptions
-XX:+PrintGCDetails
     -Dkey=value -Dnumbers="one two three"
  >
  >
  > regards
  > Bala


  >
  > On 2 November 2015 at 12:21, Jean-Baptiste Onofré
 <j...@nanthrax.net <mailto:j...@nanthrax.net>
<mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
  >>
  >> Hi,
  >>
  >> do you have something special in
conf/spark-defaults.conf
 (especially on the eventLog directory) ?
  >>
  >> Regards
  >> JB
  >>
  >>
  >> On 11/02/2015 07:48 AM, Balachandar R.A. wrote:
  >>>
  >>> Can someone tell me at what point this error
  

Re: Error : - No filesystem for scheme: spark

2015-11-01 Thread Jean-Baptiste Onofré

Hi,

do you have something special in conf/spark-defaults.conf (especially on 
the eventLog directory) ?


Regards
JB

On 11/02/2015 07:48 AM, Balachandar R.A. wrote:

Can someone tell me at what point this error could come?

In one of my use cases, I am trying to use hadoop custom input format.
Here is my code.

|valhConf:Configuration=sc.hadoopConfiguration
hConf.set("fs.hdfs.impl",classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)hConf.set("fs.file.impl",classOf[org.apache.hadoop.fs.LocalFileSystem].getName)varjob
=newJob(hConf)FileInputFormat.setInputPaths(job,newPath("hdfs:///user/bala/MyBinaryFile"));varhRDD
=newNewHadoopRDD(sc,classOf[RandomAccessInputFormat],classOf[IntWritable],classOf[BytesWritable],job.getConfiguration())valcount
=hRDD.mapPartitionsWithInputSplit{(split,iter)=>myfuncPart(split,iter)}|

|The moment I invoke mapPartitionsWithInputSplit() method, I get the
below error in my spark-submit launch|

|
|

|15/10/3011:11:39WARN scheduler.TaskSetManager:Losttask 0.0in stage
0.0(TID 0,40.221.94.235):java.io.IOException:NoFileSystemforscheme:spark
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)|

Any help here to move towards fixing this will be of great help



Thanks

Bala



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark-1.5.1 application detail ui url

2015-10-29 Thread Jean-Baptiste Onofré

Hi,

The running application UI should be available on the worker IP (on 4040 
default port), right ?


So, basically, the problem is on the link of the master UI, correct ?

Regards
JB

On 10/29/2015 01:45 PM, carlilek wrote:

I administer an HPC cluster that runs Spark clusters as jobs. We run Spark
over the backend network (typically used for MPI), which is not accessible
outside the cluster. Until we upgraded to 1.5.1 (from 1.3.1), this did not
present a problem. Now the Application Detail UI link is returning the IP
address of the backend network of the driver machine rather than that
machine's hostname. Consequentially, users cannot access that page.  I am
unsure what might have changed or how I might change the behavior back.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-5-1-application-detail-ui-url-tp25226.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Spark on Yarn

2015-10-21 Thread Jean-Baptiste Onofré


Hi
The compiled version (master side) and client version diverge on spark network 
JavaUtils. You should use the same/aligned version.
RegardsJB


Sent from my Samsung device

 Original message 
From: Raghuveer Chanda  
Date: 21/10/2015  12:33  (GMT+01:00) 
To: user@spark.apache.org 
Subject: Spark on Yarn 

Hi all,
I am trying to run spark on yarn in quickstart cloudera vm.It already has spark 
1.3 and Hadoop 2.6.0-cdh5.4.0 installed.(I am not using spark-submit since I 
want to run a different version of spark). I am able to run spark 1.3 on yarn 
but get the below error for spark 1.4.
The log shows its running on spark 1.4 but still gives a error on a method 
which is present in 1.4 and not 1.3. Even the fat jar contains the class files 
of 1.4.

As far as running in yarn the installed spark version shouldnt matter, but 
still its running on the other version.

Hadoop Version:Hadoop 2.6.0-cdh5.4.0Subversion 
http://github.com/cloudera/hadoop -r 
c788a14a5de9ecd968d1e2666e8765c5f018c271Compiled by jenkins on 
2015-04-21T19:18ZCompiled with protoc 2.5.0From source with checksum 
cd78f139c66c13ab5cee96e15a629025This command was run using 
/usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
Error:LogType:stderrLog Upload Time:Tue Oct 20 21:58:56 -0700 
2015LogLength:2334Log Contents:SLF4J: Class path contains multiple SLF4J 
bindings.SLF4J: Found binding in 
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 Found binding in 
[jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/10/simple-yarn-app-1.1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
 See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.SLF4J: Actual binding is of type 
[org.slf4j.impl.Log4jLoggerFactory]15/10/20 21:58:50 INFO spark.SparkContext: 
Running Spark version 1.4.015/10/20 21:58:53 INFO spark.SecurityManager: 
Changing view acls to: yarn15/10/20 21:58:53 INFO spark.SecurityManager: 
Changing modify acls to: yarn15/10/20 21:58:53 INFO spark.SecurityManager: 
SecurityManager: authentication disabled; ui acls disabled; users with view 
permissions: Set(yarn); users with modify permissions: Set(yarn)Exception in 
thread "main" java.lang.NoSuchMethodError: 
org.apache.spark.network.util.JavaUtils.timeStringAsSec(Ljava/lang/String;)J   
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1027)   at 
org.apache.spark.SparkConf.getTimeAsSeconds(SparkConf.scala:194) at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:68)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) at 
org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at 
org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)   at 
org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245) at 
org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52) at 
org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)at 
org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)   at 
org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) at 
org.apache.spark.SparkContext.(SparkContext.scala:424) at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) at 
com.hortonworks.simpleyarnapp.HelloWorld.main(HelloWorld.java:50)15/10/20 
21:58:53 INFO util.Utils: Shutdown hook called
Please help :)

--Regards and Thanks,Raghuveer Chanda




Re: Spark 1.5 Streaming and Kinesis

2015-10-20 Thread Jean-Baptiste Onofré

Hi Phil,

thanks for the Jira, I will try to take a look asap.

Regards
JB

On 10/19/2015 11:07 PM, Phil Kallos wrote:

I am currently trying a few code changes to see if I can squash this
error. I have created https://issues.apache.org/jira/browse/SPARK-11193
to track progress, hope that is okay!

In the meantime, can anyone confirm their ability to run the Kinesis-ASL
example using Spark > 1.5.x ? Would be helpful to know if it works in
some cases but not others.
http://spark.apache.org/docs/1.5.1/streaming-kinesis-integration.html

Thanks
Phil

On Thu, Oct 15, 2015 at 10:35 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

Hi Phil,

sorry I didn't have time to investigate yesterday (I was on a couple
of other Apache projects ;)). I will try to do it today. I keep you
posted.

Regards
JB

On 10/16/2015 07:21 AM, Phil Kallos wrote:

JB,

To clarify, you are able to run the Amazon Kinesis example
provided in
the spark examples dir?

bin/run-example streaming.KinesisWordCountASL [app name] [stream
name]
[endpoint url] ?

If it helps, below are the steps I used to build spark

mvn -Pyarn -Pkinesis-asl -Phadoop-2.6 -DskipTests clean package

And I did this with revision
4f894dd6906311cb57add6757690069a18078783
(v.1.5.1)

Thanks,
Phil


On Thu, Oct 15, 2015 at 2:31 AM, Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
<mailto:cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>>>
wrote:

 So running it using spark-submit doesnt change anything, it
still works.

 When reading the code

https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L100
 it looks like the receivers are definitely being ser/de. I
think
 this is the issue, need to find a way to confirm that now...

 2015-10-15 16:12 GMT+07:00 Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
 <mailto:cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>>>:

 Hey,

 A quick update on other things that have been tested.

 When looking at the compiled code of the
 spark-streaming-kinesis-asl jar everything looks normal
(there
 is a class that implements SyncMap and it is used
inside the
 receiver).
 Starting a spark shell and using introspection to
instantiate a
 receiver and check that blockIdToSeqNumRanges
implements SyncMap
 works too. So obviously it has the correct type
according to that.

 Another thing to test could be to do the same introspection
 stuff but inside a spark job to make sure it is not a
problem in
 the way the jobs are run.
 The other idea would be that this is a problem related to
 ser/de. For example if the receiver was being
serialized and
 then deserialized it could definitely happen depending
on the
 lib used and its configuration that it just doesn't
preserve the
 concrete type. So it would deserialize using the
compile type
 instead of the runtime type.

 Cheers,
     Eugen


 2015-10-15 13:41 GMT+07:00 Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>
 <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>:

 Thanks for the update Phil.

 I'm preparing a environment to reproduce it.

 I keep you posted.

 Thanks again,
 Regards
 JB

 On 10/15/2015 08:36 AM, Phil Kallos wrote:

 Not a dumb question, but yes I updated all of the
 library references to
 1.5, including  (even tried 1.5.1).

 // Versions.spark set elsewhere to "1.5.0"
 "org.apache.spark" %%
"spark-streaming-kinesis-asl" %
 Versions.spark %
 "provided"

 I am experiencing the issue in my own spark
project, but
 also when I try
 to run the spark streaming kinesis example that
comes in
 spark/examples

 Tried running the streaming jo

Re: Spark 1.5 Streaming and Kinesis

2015-10-20 Thread Jean-Baptiste Onofré

Hi Phil,

did you see my comments in the Jira ?

Can you provide an update in the Jira please ? Thanks !

Regards
JB

On 10/19/2015 11:07 PM, Phil Kallos wrote:

I am currently trying a few code changes to see if I can squash this
error. I have created https://issues.apache.org/jira/browse/SPARK-11193
to track progress, hope that is okay!

In the meantime, can anyone confirm their ability to run the Kinesis-ASL
example using Spark > 1.5.x ? Would be helpful to know if it works in
some cases but not others.
http://spark.apache.org/docs/1.5.1/streaming-kinesis-integration.html

Thanks
Phil

On Thu, Oct 15, 2015 at 10:35 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

Hi Phil,

sorry I didn't have time to investigate yesterday (I was on a couple
of other Apache projects ;)). I will try to do it today. I keep you
posted.

Regards
JB

On 10/16/2015 07:21 AM, Phil Kallos wrote:

JB,

To clarify, you are able to run the Amazon Kinesis example
provided in
the spark examples dir?

bin/run-example streaming.KinesisWordCountASL [app name] [stream
name]
[endpoint url] ?

If it helps, below are the steps I used to build spark

mvn -Pyarn -Pkinesis-asl -Phadoop-2.6 -DskipTests clean package

And I did this with revision
4f894dd6906311cb57add6757690069a18078783
(v.1.5.1)

Thanks,
Phil


On Thu, Oct 15, 2015 at 2:31 AM, Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
<mailto:cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>>>
wrote:

 So running it using spark-submit doesnt change anything, it
still works.

 When reading the code

https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L100
 it looks like the receivers are definitely being ser/de. I
think
 this is the issue, need to find a way to confirm that now...

 2015-10-15 16:12 GMT+07:00 Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
 <mailto:cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>>>:

 Hey,

 A quick update on other things that have been tested.

 When looking at the compiled code of the
 spark-streaming-kinesis-asl jar everything looks normal
(there
 is a class that implements SyncMap and it is used
inside the
 receiver).
 Starting a spark shell and using introspection to
instantiate a
 receiver and check that blockIdToSeqNumRanges
implements SyncMap
 works too. So obviously it has the correct type
according to that.

 Another thing to test could be to do the same introspection
 stuff but inside a spark job to make sure it is not a
problem in
 the way the jobs are run.
 The other idea would be that this is a problem related to
 ser/de. For example if the receiver was being
serialized and
 then deserialized it could definitely happen depending
on the
 lib used and its configuration that it just doesn't
preserve the
 concrete type. So it would deserialize using the
compile type
 instead of the runtime type.

 Cheers,
     Eugen


 2015-10-15 13:41 GMT+07:00 Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>
 <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>:

 Thanks for the update Phil.

 I'm preparing a environment to reproduce it.

 I keep you posted.

 Thanks again,
 Regards
 JB

 On 10/15/2015 08:36 AM, Phil Kallos wrote:

 Not a dumb question, but yes I updated all of the
 library references to
 1.5, including  (even tried 1.5.1).

 // Versions.spark set elsewhere to "1.5.0"
 "org.apache.spark" %%
"spark-streaming-kinesis-asl" %
 Versions.spark %
 "provided"

 I am experiencing the issue in my own spark
project, but
 also when I try
 to run the spark streaming kinesis example that
comes in
 spark/examples


Re: Spark 1.5 Streaming and Kinesis

2015-10-20 Thread Jean-Baptiste Onofré

Hi Phil,

as you can see in the Jira, I tried with both Spark 1.5.1 and 
1.6.0-SNAPSHOT, and I'm not able to reproduce your issue.


What's your Java version (and I guess you use scala 2.10 for 
compilation, not -Pscala-2.11) ?


Regards
JB

On 10/20/2015 03:59 PM, Jean-Baptiste Onofré wrote:

Hi Phil,

did you see my comments in the Jira ?

Can you provide an update in the Jira please ? Thanks !

Regards
JB

On 10/19/2015 11:07 PM, Phil Kallos wrote:

I am currently trying a few code changes to see if I can squash this
error. I have created https://issues.apache.org/jira/browse/SPARK-11193
to track progress, hope that is okay!

In the meantime, can anyone confirm their ability to run the Kinesis-ASL
example using Spark > 1.5.x ? Would be helpful to know if it works in
some cases but not others.
http://spark.apache.org/docs/1.5.1/streaming-kinesis-integration.html

Thanks
Phil

On Thu, Oct 15, 2015 at 10:35 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

Hi Phil,

sorry I didn't have time to investigate yesterday (I was on a couple
of other Apache projects ;)). I will try to do it today. I keep you
posted.

Regards
JB

On 10/16/2015 07:21 AM, Phil Kallos wrote:

JB,

To clarify, you are able to run the Amazon Kinesis example
provided in
the spark examples dir?

bin/run-example streaming.KinesisWordCountASL [app name] [stream
name]
[endpoint url] ?

If it helps, below are the steps I used to build spark

mvn -Pyarn -Pkinesis-asl -Phadoop-2.6 -DskipTests clean package

And I did this with revision
4f894dd6906311cb57add6757690069a18078783
(v.1.5.1)

Thanks,
Phil


On Thu, Oct 15, 2015 at 2:31 AM, Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
<mailto:cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>>>
wrote:

 So running it using spark-submit doesnt change anything, it
still works.

 When reading the code

https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L100

 it looks like the receivers are definitely being ser/de. I
think
 this is the issue, need to find a way to confirm that now...

 2015-10-15 16:12 GMT+07:00 Eugen Cepoi
<cepoi.eu...@gmail.com <mailto:cepoi.eu...@gmail.com>
 <mailto:cepoi.eu...@gmail.com
<mailto:cepoi.eu...@gmail.com>>>:

 Hey,

 A quick update on other things that have been tested.

 When looking at the compiled code of the
 spark-streaming-kinesis-asl jar everything looks normal
(there
 is a class that implements SyncMap and it is used
inside the
 receiver).
 Starting a spark shell and using introspection to
instantiate a
 receiver and check that blockIdToSeqNumRanges
implements SyncMap
 works too. So obviously it has the correct type
according to that.

 Another thing to test could be to do the same
introspection
 stuff but inside a spark job to make sure it is not a
problem in
 the way the jobs are run.
 The other idea would be that this is a problem
related to
 ser/de. For example if the receiver was being
serialized and
 then deserialized it could definitely happen depending
on the
 lib used and its configuration that it just doesn't
preserve the
 concrete type. So it would deserialize using the
compile type
 instead of the runtime type.

 Cheers,
     Eugen


 2015-10-15 13:41 GMT+07:00 Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>
 <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>:

 Thanks for the update Phil.

 I'm preparing a environment to reproduce it.

 I keep you posted.

 Thanks again,
 Regards
 JB

 On 10/15/2015 08:36 AM, Phil Kallos wrote:

 Not a dumb question, but yes I updated all of
the
 library references to
 1.5, including  (even tried 1.5.1).

 // Versions.spark set elsewhere to "1.5.0"
 "org.apache.spark" %%
"spark-streaming-kinesis-asl" %
 Versions.spark %
 "provided"

   

Re: Can not subscript to mailing list

2015-10-20 Thread Jean-Baptiste Onofré

Hi Jeff,

did you try to send an e-mail to subscribe-u...@spark.apache.org ?

Then you will receive a confirmation e-mail, you just have to reply 
(nothing special to put in the subject, the important thing is the 
reply-to unique address).


Regards
JB

On 10/20/2015 05:48 PM, jeff.sadow...@gmail.com wrote:

I am having issues subscribing to the user@spark.apache.org mailing list.

I would like to be added to the mailing list so I can post some
configuration questions I have to the list that I do not see asked on the
list.

When I tried adding myself I got an email titled "confirm subscribe to
user@spark.apache.org" but after replying as it says to do I get nothing. I
tried today to remove and re-add myself and I got a reply back saying I was
not on the list when trying to unsubscribe. When I tried to add myself again
I don't get any emails from it this time. I'm getting other email from other
people and nothing is in spam. I tried with a second email account as well
and the same thing is happening on it. I got the initial "confirm subscribe
to user@spark.apache.org" email but after replying I get nothing. I can't
even get another "confirm subscribe to user@spark.apache.org" message. Both
of my emails are from google servers one is an organization email the first
is a personal google email




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-not-subscript-to-mailing-list-tp25143.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.5 Streaming and Kinesis

2015-10-15 Thread Jean-Baptiste Onofré

Thanks for the update Phil.

I'm preparing a environment to reproduce it.

I keep you posted.

Thanks again,
Regards
JB

On 10/15/2015 08:36 AM, Phil Kallos wrote:

Not a dumb question, but yes I updated all of the library references to
1.5, including  (even tried 1.5.1).

// Versions.spark set elsewhere to "1.5.0"
"org.apache.spark" %% "spark-streaming-kinesis-asl" % Versions.spark %
"provided"

I am experiencing the issue in my own spark project, but also when I try
to run the spark streaming kinesis example that comes in spark/examples

Tried running the streaming job locally, and also in EMR with release
4.1.0 that includes Spark 1.5

Very strange!

-- Forwarded message ------

From: "Jean-Baptiste Onofré" <j...@nanthrax.net <mailto:j...@nanthrax.net>>
To: user@spark.apache.org <mailto:user@spark.apache.org>
Cc:
Date: Thu, 15 Oct 2015 08:03:55 +0200
Subject: Re: Spark 1.5 Streaming and Kinesis
Hi Phil,
KinesisReceiver is part of extra. Just a dumb question: did you
update all, including the Spark Kinesis extra containing the
KinesisReceiver ?
I checked on tag v1.5.0, and at line 175 of the KinesisReceiver, we see:
blockIdToSeqNumRanges.clear()
which is a:
private val blockIdToSeqNumRanges = new
mutable.HashMap[StreamBlockId, SequenceNumberRanges]
 with mutable.SynchronizedMap[StreamBlockId, SequenceNumberRanges]
So, it doesn't look fully correct to me.
Let me investigate a bit this morning.
Regards
JB
On 10/15/2015 07:49 AM, Phil Kallos wrote:
We are trying to migrate from Spark1.4 to Spark1.5 for our Kinesis
streaming applications, to take advantage of the new Kinesis
checkpointing improvements in 1.5.
However after upgrading, we are consistently seeing the following error:
java.lang.ClassCastException: scala.collection.mutable.HashMap cannot be
cast to scala.collection.mutable.SynchronizedMap
at

org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:175)
at

org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at

org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at

org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at

org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I even get this when running the Kinesis examples :
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html with
bin/run-example streaming.KinesisWordCountASL
    Am I doing something incorrect?


--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net <http://blog.nanthrax.net/>
Talend - http://www.talend.com <http://www.talend.com/>

Hi,



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.5 Streaming and Kinesis

2015-10-15 Thread Jean-Baptiste Onofré

Hi Phil,

sorry I didn't have time to investigate yesterday (I was on a couple of 
other Apache projects ;)). I will try to do it today. I keep you posted.


Regards
JB

On 10/16/2015 07:21 AM, Phil Kallos wrote:

JB,

To clarify, you are able to run the Amazon Kinesis example provided in
the spark examples dir?

bin/run-example streaming.KinesisWordCountASL [app name] [stream name]
[endpoint url] ?

If it helps, below are the steps I used to build spark

mvn -Pyarn -Pkinesis-asl -Phadoop-2.6 -DskipTests clean package

And I did this with revision 4f894dd6906311cb57add6757690069a18078783
(v.1.5.1)

Thanks,
Phil


On Thu, Oct 15, 2015 at 2:31 AM, Eugen Cepoi <cepoi.eu...@gmail.com
<mailto:cepoi.eu...@gmail.com>> wrote:

So running it using spark-submit doesnt change anything, it still works.

When reading the code

https://github.com/apache/spark/blob/branch-1.5/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L100
it looks like the receivers are definitely being ser/de. I think
this is the issue, need to find a way to confirm that now...

2015-10-15 16:12 GMT+07:00 Eugen Cepoi <cepoi.eu...@gmail.com
<mailto:cepoi.eu...@gmail.com>>:

Hey,

A quick update on other things that have been tested.

When looking at the compiled code of the
spark-streaming-kinesis-asl jar everything looks normal (there
is a class that implements SyncMap and it is used inside the
receiver).
Starting a spark shell and using introspection to instantiate a
receiver and check that blockIdToSeqNumRanges implements SyncMap
works too. So obviously it has the correct type according to that.

Another thing to test could be to do the same introspection
stuff but inside a spark job to make sure it is not a problem in
the way the jobs are run.
The other idea would be that this is a problem related to
ser/de. For example if the receiver was being serialized and
then deserialized it could definitely happen depending on the
lib used and its configuration that it just doesn't preserve the
concrete type. So it would deserialize using the compile type
instead of the runtime type.

Cheers,
Eugen


2015-10-15 13:41 GMT+07:00 Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>>:

Thanks for the update Phil.

I'm preparing a environment to reproduce it.

I keep you posted.

Thanks again,
Regards
JB

On 10/15/2015 08:36 AM, Phil Kallos wrote:

Not a dumb question, but yes I updated all of the
library references to
1.5, including  (even tried 1.5.1).

// Versions.spark set elsewhere to "1.5.0"
"org.apache.spark" %% "spark-streaming-kinesis-asl" %
Versions.spark %
"provided"

I am experiencing the issue in my own spark project, but
also when I try
to run the spark streaming kinesis example that comes in
spark/examples

Tried running the streaming job locally, and also in EMR
with release
4.1.0 that includes Spark 1.5

Very strange!

     ------ Forwarded message --

 From: "Jean-Baptiste Onofré" <j...@nanthrax.net
<mailto:j...@nanthrax.net> <mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>>>
 To: user@spark.apache.org
<mailto:user@spark.apache.org>
<mailto:user@spark.apache.org
<mailto:user@spark.apache.org>>

 Cc:
 Date: Thu, 15 Oct 2015 08:03:55 +0200
 Subject: Re: Spark 1.5 Streaming and Kinesis
 Hi Phil,
 KinesisReceiver is part of extra. Just a dumb
question: did you
 update all, including the Spark Kinesis extra
containing the
 KinesisReceiver ?
 I checked on tag v1.5.0, and at line 175 of the
KinesisReceiver, we see:
 blockIdToSeqNumRanges.clear()
 which is a:
 private val blockIdToSeqNumRanges = new
 mutable.HashMap[StreamBlockId, SequenceNumberRanges]
  with mutable.SynchronizedMap[StreamBlockId,
SequenceNumberRanges]
 So, it doesn't look fully correct to me.
 Let me investigate a bit this morning.
 Regards
  

Re: Spark 1.5 Streaming and Kinesis

2015-10-15 Thread Jean-Baptiste Onofré

Hi Phil,

KinesisReceiver is part of extra. Just a dumb question: did you update 
all, including the Spark Kinesis extra containing the KinesisReceiver ?


I checked on tag v1.5.0, and at line 175 of the KinesisReceiver, we see:

blockIdToSeqNumRanges.clear()

which is a:

private val blockIdToSeqNumRanges = new mutable.HashMap[StreamBlockId, 
SequenceNumberRanges]

with mutable.SynchronizedMap[StreamBlockId, SequenceNumberRanges]

So, it doesn't look fully correct to me.

Let me investigate a bit this morning.

Regards
JB

On 10/15/2015 07:49 AM, Phil Kallos wrote:

Hi,

We are trying to migrate from Spark1.4 to Spark1.5 for our Kinesis
streaming applications, to take advantage of the new Kinesis
checkpointing improvements in 1.5.

However after upgrading, we are consistently seeing the following error:

java.lang.ClassCastException: scala.collection.mutable.HashMap cannot be
cast to scala.collection.mutable.SynchronizedMap
at
org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:175)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I even get this when running the Kinesis examples :
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html with

bin/run-example streaming.KinesisWordCountASL

Am I doing something incorrect?




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark 1.5 Streaming and Kinesis

2015-10-15 Thread Jean-Baptiste Onofré
By correct, I mean: the map declaration looks good to me, so the 
ClassCastException is weird ;)


I'm trying to reproduce the issue in order to investigate.

Regards
JB

On 10/15/2015 08:03 AM, Jean-Baptiste Onofré wrote:

Hi Phil,

KinesisReceiver is part of extra. Just a dumb question: did you update
all, including the Spark Kinesis extra containing the KinesisReceiver ?

I checked on tag v1.5.0, and at line 175 of the KinesisReceiver, we see:

blockIdToSeqNumRanges.clear()

which is a:

private val blockIdToSeqNumRanges = new mutable.HashMap[StreamBlockId,
SequenceNumberRanges]
 with mutable.SynchronizedMap[StreamBlockId, SequenceNumberRanges]

So, it doesn't look fully correct to me.

Let me investigate a bit this morning.

Regards
JB

On 10/15/2015 07:49 AM, Phil Kallos wrote:

Hi,

We are trying to migrate from Spark1.4 to Spark1.5 for our Kinesis
streaming applications, to take advantage of the new Kinesis
checkpointing improvements in 1.5.

However after upgrading, we are consistently seeing the following error:

java.lang.ClassCastException: scala.collection.mutable.HashMap cannot be
cast to scala.collection.mutable.SynchronizedMap
at
org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:175)

at
org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)

at
org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)

at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)

at
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)

at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

I even get this when running the Kinesis examples :
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html
with

bin/run-example streaming.KinesisWordCountASL

Am I doing something incorrect?






--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why is my spark executor is terminated?

2015-10-14 Thread Jean-Baptiste Onofré

Hi Ningjun

I just wanted to check that the master didn't "kick out" the worker, as 
the "Disassociated" can come from the master.


Here it looks like the worker killed the executor before shutting down 
itself.


What's the Spark version ?

Regards
JB

On 10/14/2015 04:42 PM, Wang, Ningjun (LNG-NPV) wrote:

I checked master log before and did not find anything wrong. Unfortunately I 
have lost the master log now.

So you think master log will tell you why executor is down?

Regards,

Ningjun Wang


-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Tuesday, October 13, 2015 10:42 AM
To: user@spark.apache.org
Subject: Re: Why is my spark executor is terminated?

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:

We use spark on windows 2008 R2 servers. We use one spark context
which create one spark executor. We run spark master, slave, driver,
executor on one single machine.

  From time to time, we found that the executor JAVA process was
terminated. I cannot fig out why it was terminated. Can anybody help
me on how to find out why the executor was terminated?

The spark slave log. It shows that it kill the executor process

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

But why does it do that?

Here is the detailed logs from spark slave

2015-10-13 09:58:04,915 WARN
[sparkWorker-akka.actor.default-dispatcher-16]
remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71))
- Association with remote system
[akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388]
to
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with
configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [3] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [4] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

2015-10-13 09:58:06,103 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Runner thread for executor
app-20151009201453-/0 interrupted

2015-10-13 09:58:06,118 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Killing process!

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Executor app-20151009201453-/0
finished with state KILLED exitStatus 1

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Cleaning up local directories for
application app-20151009201453-

Thanks

Ningjun Wang



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbon

Re: Why is the Spark Web GUI failing with JavaScript "Uncaught SyntaxError"?

2015-10-13 Thread Jean-Baptiste Onofré

Hi Joshua,

What's the Spark version and what's your browser ?

I just tried on Spark 1.6-SNAPSHOT with firefox and it works fine.

Thanks
Regards
JB

On 10/13/2015 02:17 PM, Joshua Fox wrote:

I am accessing the Spark Jobs Web GUI, running on AWS EMR.

I can access this webapp (port 4040 as per default), but it only
half-renders, producing "Uncaught SyntaxError: Unexpected token <"

Here is a screenshot <http://i.imgur.com/qP2rH46.png> including Chrome
Developer Console.

Screenshot <http://i.stack.imgur.com/cf8gp.png>

Here are some of the error messages in my Chrome console.

/Uncaught SyntaxError: Unexpected token <
(index):3 Resource interpreted as Script but transferred with MIME type
text/html:
"http://ec2-52-89-59-167.us-west-2.compute.amazonaws.com:4040/jobs/;.
(index):74 Uncaught ReferenceError: drawApplicationTimeline is not defined
(index):12 Resource interpreted as Image but transferred with MIME type
text/html:
"http://ec2-52-89-59-167.us-west-2.compute.amazonaws.com:4040/jobs/;

/
Note that the History GUI at port 18080 and the Hadoop GUI at port 8088
work fine, and the Spark jobs GUI does partly render. So, it seems that
my browser proxy is not the cause of this problem.

Joshua


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why is the Spark Web GUI failing with JavaScript "Uncaught SyntaxError"?

2015-10-13 Thread Jean-Baptiste Onofré

Thanks for the update Joshua.

Let me try with Spark 1.4.1.

I keep you posted.

Regards
JB

On 10/13/2015 04:17 PM, Joshua Fox wrote:

  * Spark 1.4.1, part of EMR emr-4.0.0
  * Chrome Version 41.0.2272.118 (64-bit) on Ubuntu


On Tue, Oct 13, 2015 at 3:27 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:

Hi Joshua,

What's the Spark version and what's your browser ?

I just tried on Spark 1.6-SNAPSHOT with firefox and it works fine.

Thanks
Regards
JB

On 10/13/2015 02:17 PM, Joshua Fox wrote:

I am accessing the Spark Jobs Web GUI, running on AWS EMR.

I can access this webapp (port 4040 as per default), but it only
half-renders, producing "Uncaught SyntaxError: Unexpected token <"

Here is a screenshot <http://i.imgur.com/qP2rH46.png> including
Chrome
Developer Console.

Screenshot <http://i.stack.imgur.com/cf8gp.png>

Here are some of the error messages in my Chrome console.

/Uncaught SyntaxError: Unexpected token <
(index):3 Resource interpreted as Script but transferred with
MIME type
text/html:
"http://ec2-52-89-59-167.us-west-2.compute.amazonaws.com:4040/jobs/;.
(index):74 Uncaught ReferenceError: drawApplicationTimeline is
not defined
(index):12 Resource interpreted as Image but transferred with
MIME type
text/html:
"http://ec2-52-89-59-167.us-west-2.compute.amazonaws.com:4040/jobs/;

/
Note that the History GUI at port 18080 and the Hadoop GUI at
port 8088
work fine, and the Spark jobs GUI does partly render. So, it
seems that
my browser proxy is not the cause of this problem.

Joshua


--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark shuffle service does not work in stand alone

2015-10-13 Thread Jean-Baptiste Onofré

Hi,

AFAIK, the shuffle service makes sense only to delegate the shuffle to 
mapreduce (as mapreduce shuffle is most of the time faster than the 
spark shuffle).

As you run in standalone mode, shuffle service will use the spark shuffle.

Not 100% thought.

Regards
JB

On 10/13/2015 04:23 PM, saif.a.ell...@wellsfargo.com wrote:

Has anyone tried shuffle service in Stand Alone cluster mode? I want to
enable it for d.a. but my jobs never start when I submit them.
This happens with all my jobs.
15/10/13 08:29:45 INFO DAGScheduler: Job 0 failed: json at
DataLoader.scala:86, took 16.318615 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 0.0 (TID 7, 162.101.194.47):
ExecutorLostFailure (executor 4 lost)
Driver stacktrace:
 at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
 at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
 at scala.Option.foreach(Option.scala:236)
 at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
 at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
 at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
 at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1942)
 at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1003)
 at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
 at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
 at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
 at org.apache.spark.rdd.RDD.reduce(RDD.scala:985)
 at
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1114)
 at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
 at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
 at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
 at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1091)
 at
org.apache.spark.sql.execution.datasources.json.InferSchema$.apply(InferSchema.scala:58)
 at
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:105)
 at
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
 at scala.Option.getOrElse(Option.scala:120)
 at
org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
 at
org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
 at
org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
 at
org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
 at
org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:31)
 at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
 at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
 at
org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:219)
 at
org.apache.saif.loaders.DataLoader$.load_json(DataLoader.scala:86)


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why is my spark executor is terminated?

2015-10-13 Thread Jean-Baptiste Onofré

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:

We use spark on windows 2008 R2 servers. We use one spark context which
create one spark executor. We run spark master, slave, driver, executor
on one single machine.

 From time to time, we found that the executor JAVA process was
terminated. I cannot fig out why it was terminated. Can anybody help me
on how to find out why the executor was terminated?

The spark slave log. It shows that it kill the executor process

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

But why does it do that?

Here is the detailed logs from spark slave

2015-10-13 09:58:04,915 WARN
[sparkWorker-akka.actor.default-dispatcher-16]
remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) -
Association with remote system
[akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388]
to
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388]
was not delivered. [2] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680]
was not delivered. [3] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680]
was not delivered. [4] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

2015-10-13 09:58:06,103 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Runner thread for executor
app-20151009201453-/0 interrupted

2015-10-13 09:58:06,118 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Killing process!

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Executor app-20151009201453-/0
finished with state KILLED exitStatus 1

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Cleaning up local directories for
application app-20151009201453-

Thanks

Ningjun Wang



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Does Spark use more memory than MapReduce?

2015-10-12 Thread Jean-Baptiste Onofré

Hi,

I think it depends of the storage level you use (MEMORY, DISK, or 
MEMORY_AND_DISK).


By default, micro-batching as designed in Spark requires more memory but 
much faster: when you use MapReduce, each map and reduce tasks have to 
use HDFS as backend of the data pipeline between the tasks. In Spark, 
disk flush is not always performed: it tries to keep data in memory as 
much as possible. So, it's balance to find between fast 
processing/micro-batching and memory consumption.
In some cases, using the disk is faster anyway (for instance, a 
MapReduce shuffle can be faster than a Spark shuffle, but you have an 
option to run a ShuffleMapReduceTask from Spark).


I'm speaking under cover of the experts ;)

Regards
JB

On 10/12/2015 06:52 PM, YaoPau wrote:

I had this question come up and I'm not sure how to answer it.  A user said
that, for a big job, he thought it would be better to use MapReduce since it
writes to disk between iterations instead of keeping the data in memory the
entire time like Spark generally does.

I mentioned that Spark can cache to disk as well, but I'm not sure about the
overarching question (which I realize is vague): for a typical job, would
Spark use more memory than a MapReduce job?  Are there any memory usage
inefficiencies from either?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-use-more-memory-than-MapReduce-tp25030.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Build Failure

2015-10-08 Thread Jean-Baptiste Onofré

Hi,

I just tried and it works for me (I don't have any Maven mirror on my 
subnet).


Can you try again ? Maybe it was a temporary issue to access to Maven 
central.


The artifact is present on central:

http://repo1.maven.org/maven2/com/twitter/algebird-core_2.10/0.9.0/

Regards
JB

On 10/08/2015 09:55 AM, shahid qadri wrote:

hi

I tried to build latest master branch of spark
build/mvn -DskipTests clean package


Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [03:46 min]
[INFO] Spark Project Test Tags  SUCCESS [01:02 min]
[INFO] Spark Project Launcher . SUCCESS [01:03 min]
[INFO] Spark Project Networking ... SUCCESS [ 30.794 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 29.496 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 18.478 s]
[INFO] Spark Project Core . SUCCESS [05:42 min]
[INFO] Spark Project Bagel  SUCCESS [  6.082 s]
[INFO] Spark Project GraphX ... SUCCESS [ 23.478 s]
[INFO] Spark Project Streaming  SUCCESS [ 53.969 s]
[INFO] Spark Project Catalyst . SUCCESS [02:12 min]
[INFO] Spark Project SQL .. SUCCESS [03:02 min]
[INFO] Spark Project ML Library ... SUCCESS [02:57 min]
[INFO] Spark Project Tools  SUCCESS [  3.139 s]
[INFO] Spark Project Hive . SUCCESS [03:25 min]
[INFO] Spark Project REPL . SUCCESS [ 18.303 s]
[INFO] Spark Project Assembly . SUCCESS [01:40 min]
[INFO] Spark Project External Twitter . SUCCESS [ 16.707 s]
[INFO] Spark Project External Flume Sink .. SUCCESS [ 52.234 s]
[INFO] Spark Project External Flume ... SUCCESS [ 13.069 s]
[INFO] Spark Project External Flume Assembly .. SUCCESS [  4.653 s]
[INFO] Spark Project External MQTT  SUCCESS [01:56 min]
[INFO] Spark Project External MQTT Assembly ... SUCCESS [ 15.233 s]
[INFO] Spark Project External ZeroMQ .. SUCCESS [ 13.267 s]
[INFO] Spark Project External Kafka ... SUCCESS [ 41.663 s]
[INFO] Spark Project Examples . FAILURE [07:36 min]
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 40:07 min
[INFO] Finished at: 2015-10-08T13:14:31+05:30
[INFO] Final Memory: 373M/1205M
[INFO] 
[ERROR] Failed to execute goal on project spark-examples_2.10: Could not resolve 
dependencies for project org.apache.spark:spark-examples_2.10:jar:1.6.0-SNAPSHOT: 
The following artifacts could not be resolved: 
com.twitter:algebird-core_2.10:jar:0.9.0, com.github.stephenc:jamm:jar:0.2.5: 
Could not transfer artifact com.twitter:algebird-core_2.10:jar:0.9.0 from/to 
central (https://repo1.maven.org/maven2): GET request of: 
com/twitter/algebird-core_2.10/0.9.0/algebird-core_2.10-0.9.0.jar from central 
failed: Connection reset -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-examples_2.10
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark in Yarn-client mode

2015-10-07 Thread Jean-Baptiste Onofré

Hi Sushrut,

which packaging of Spark do you use ?
Do you have a working Yarn cluster (with at least one worker) ?

spark-hadoop-x ?

Regards
JB

On 10/08/2015 07:23 AM, Sushrut Ikhar wrote:

Hi,
I am new to Spark and I have been trying to run Spark in yarn-client mode.

I get this error in yarn logs :
Error: Could not find or load main class
org.apache.spark.executor.CoarseGrainedExecutorBackend

Also, I keep getting these warnings:

WARN YarnScheduler: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources

WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has
disassociated

WARN ReliableDeliverySupervisor: Association with remote system
[akka.tcp://sparkYarnAM@] has failed, address is now
gated for [5000] ms. Reason is: [Disassociated].

I believe that executors are starting but are unable to connect back to
the driver.
How do I resolve this?
Also, I need help in locating the driver and executor node logs.

Thanks.

Regards,

Sushrut Ikhar
https://about.me/sushrutikhar

<https://about.me/sushrutikhar?promo=email_sig>



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Help needed to reproduce bug

2015-10-06 Thread Jean-Baptiste Onofré

Hi Nick,

I will try to reproduce your issue on a couple of environment. Just 
wanted which kind of environment you use: spark standalone, spark on 
yarn, or spark on mesos ?


For you, does it occur with any transform() on any RDD or do you use 
specific RDD ?


I plan to use your code in a main and use spark-submit: do you use such 
kind of deployment ?


Thanks !
Regards
JB

On 10/07/2015 07:18 AM, pnpritchard wrote:

Hi spark community,

I was hoping someone could help me by running a code snippet below in the
spark shell, and seeing if they see the same buggy behavior I see. Full
details of the bug can be found in this JIRA issue I filed:
https://issues.apache.org/jira/browse/SPARK-10942.

The issue was closed due to cannot reproduce, however, I can't seem to shake
it. I have worked on this for a while, removing all known variables, and
trying different versions of spark (1.5.0, 1.5.1, master), and different OSs
(Mac OSX, Debian Linux). My coworkers have tried as well and see the same
behavior. This has me convinced that I cannot be the only one in the
community to be able to produce this.

If you have a minute or two, please open a spark shell and copy/paste the
below code. After 30 seconds, check the spark ui, storage tab. If you see
some cached RDDs listed, then the bug has been reproduced. If not, then
there is no bug... and I may be losing my mind.

Thanks in advance!

Nick





import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable

val ssc = new StreamingContext(sc, Seconds(1))

val inputRDDs = mutable.Queue.tabulate(30) { i =>
   sc.parallelize(Seq(i))
}

val input = ssc.queueStream(inputRDDs)

val output = input.transform { rdd =>
   if (rdd.isEmpty()) {
 rdd
   } else {
 val rdd2 = rdd.map(identity)
 rdd2.cache()
 rdd2.setName(rdd.first().toString)
 val rdd3 = rdd2.map(identity) ++ rdd2.map(identity)
 rdd3
   }
}

output.print()

ssc.start()





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-needed-to-reproduce-bug-tp24965.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: OutOfMemoryError

2015-10-05 Thread Jean-Baptiste Onofré

Hi Ramkumar,

did you try to increase Xmx of the workers ?

Regards
JB

On 10/05/2015 08:56 AM, Ramkumar V wrote:

Hi,

When i submit java spark job in cluster mode, i'm getting following
exception.

*LOG TRACE :*

INFO yarn.ExecutorRunnable: Setting up executor with commands:
List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill
  %p', -Xms1024m, -Xmx1024m, -Djava.io.tmpdir={{PWD}}/tmp,
'-Dspark.ui.port=0', '-Dspark.driver.port=48309',
-Dspark.yarn.app.container.log.dir=, org.apache.spark.executor.CoarseGrainedExecutorBackend,
--driver-url, akka.tcp://sparkDriver@ip:port/user/CoarseGrainedScheduler,
  --executor-id, 2, --hostname, hostname , --cores, 1, --app-id,
application_1441965028669_9009, --user-class-path, file:$PWD
/__app__.jar, --user-class-path, file:$PWD/json-20090211.jar, 1>,
/stdout, 2>, /stderr).

I have a cluster of 11 machines (9 - 64 GB memory and 2 - 32 GB memory
). my input data of size 128 GB.

How to solve this exception ? is it depends on driver.memory and
execuitor.memory setting ?


*Thanks*,
<https://in.linkedin.com/in/ramkumarcs31>



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: DStream Transformation to save JSON in Cassandra 2.1

2015-10-05 Thread Jean-Baptiste Onofré

Hi Prateek,

I see two ways:

- using Cassandra CQL to adapt the RDD in the DStream to Cassandra
- using a Cassandra converter

You have a couple of code snippet in the examples. Let me know if you 
need a code sample.


Regards
JB

On 10/05/2015 04:14 PM, Prateek . wrote:

Hi,

I am beginner in Spark , this is sample data I get from Kafka stream:

{"id": 
"9f5ccb3d5f4f421392fb98978a6b368f","coordinate":{"ax":"1.20","ay":"3.80","az":"9.90","oa":"8.03","ob":"8.8","og":"9.97"}}

   val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
   val jsonf = 
lines.map(JSON.parseFull(_)).map(_.get.asInstanceOf[scala.collection.immutable.Map[String,
 Any]])

   I am getting a, DSTream[Map[String,Any]]. I need to store each coordinate 
values in the below Cassandra schema

CREATE TABLE iotdata.coordinate (
 id text PRIMARY KEY, ax double, ay double, az double, oa double, ob 
double, oz double
)

For this what transformations I need to apply before I execute 
saveToCassandra().

Thank You,
Prateek


"DISCLAIMER: This message is proprietary to Aricent and is intended solely for the 
use of the individual to whom it is addressed. It may contain privileged or confidential 
information and should not be circulated or used for any purpose other than for what it 
is intended. If you have received this message in error, please notify the originator 
immediately. If you are not the intended recipient, you are notified that you are 
strictly prohibited from using, copying, altering, or disclosing the contents of this 
message. Aricent accepts no responsibility for loss or damage arising from the use of the 
information transmitted by this email including damage from virus."

---------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org