[no subject]

2024-03-21 Thread Рамик И
Hi!
I want to exucute code inside forEachBatch that will trigger regardless of
whether there is data in the batch or not.


val kafkaReadStream = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", broker)
.option("subscribe", topicName)
.option("startingOffsets", startingOffsetsMode)
.option("maxOffsetsPerTrigger", maxOffsetsPerTrigger)
.load()


kafkaReadStream
.writeStream
.trigger(Trigger.ProcessingTime(s"$triggerProcessingTime seconds"))
.foreachBatch {


}
.start()
.awaitTermination()


[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code EU 2024 are now
open!

We will be supporting Community over Code EU, Bratislava, Slovakia,
June 3th - 5th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this years applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, March 1st, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Bratislava, Slovakia in June,
2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


[no subject]

2023-08-23 Thread ayan guha
Unsubscribe--
Best Regards,
Ayan Guha


[no subject]

2023-08-18 Thread Dipayan Dev
Unsubscribe --



With Best Regards,

Dipayan Dev
Author of *Deep Learning with Hadoop
*
M.Tech (AI), IISc, Bangalore


[no subject]

2023-07-16 Thread Varun Shah
Hi Spark Community,

I am trying to setup my forked apache/spark project locally by building and
creating a package as mentioned here under Running Individual Tests
.
Here are the steps I have followed:
>> .build/sbt  # this opens a sbt console
>> test # to execute all tests

I am getting the following error and the tests are failing. Even compile /
package sbt commands fail with the same errors.

>
> [info] compiling 19 Java sources to
> forked/spark/common/network-shuffle/target/scala-2.12/test-classes ...
> [info] compiling 330 Scala sources and 29 Java sources to
> forked/spark/core/target/scala-2.12/test-classes ...
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:21:0:
> There should at least one a single empty line separating groups 3rdParty
> and spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:32:0:
> org.json4s.JsonAST.JValue should be in group 3rdParty, not spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:33:0:
> org.json4s.JsonDSL._ should be in group 3rdParty, not spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:34:0:
> org.json4s._ should be in group 3rdParty, not spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:35:0:
> org.json4s.jackson.JsonMethods._ should be in group 3rdParty, not spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:37:0:
> java.util.Locale should be in group java, not spark.
> [error]
> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:38:0:
> scala.util.control.NonFatal should be in group scala, not spark.
> [error]
> forked/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala:226:
> File line length exceeds 100 characters
> [error] stack trace is suppressed; run last catalyst / scalaStyleOnCompile
> for the full output
> [error] stack trace is suppressed; run last scalaStyleOnTest for the full
> outpu
> [error] (catalyst / scalaStyleOnCompile) Failing because of negative
> scalastyle result
> [error] (scalaStyleOnTest) Failing because of negative scalastyle result
>

Can you please guide me if I am doing something wrong. Looking forward to
get a response soon :)

Regards,
Varun Shah


[no subject]

2022-12-13 Thread yixu2...@163.com
UNSUBSCRIBE



yixu2...@163.com


[no subject]

2022-08-12 Thread GAURAV GUPTA
Unsubscribe


[no subject]

2022-07-29 Thread Milin Korath
unsubscribe

[Impelsys]

Impelsys 


Disclaimer

The information contained in this message is intended for the addresseeonly and 
may contain classified information. If you are not the addressee,please delete 
this message and notify the sender; you should not copy or distribute this 
message or disclose its contents to anyone. Any views or opinions expressed in 
this message are those of the individual(s) and not necessarily of the 
organization. No reliance may be placed on this messagewithout written 
confirmation from an authorised representative of its contents. No guarantee is 
implied that this message or any attachment is virus free orhas not been 
intercepted and amended.

[Impelsys]
Impelsys 



[no subject]

2022-06-10 Thread Rodrigo
Hi Everyone,



My Security team has raised concerns about the requirement for root group
membership for Spark running on Kubernetes. Does anyone know the reasons
for that requirement, how insecure it is, and any alternatives if at all?



Thanks,

Rodrigo


[no subject]

2022-04-02 Thread Sungwoo Park
Hi Spark users,

We have published an article where we evaluate the performance of Spark
2.3.8 and Spark 3.2.1 (along with Hive 3). If interested, please see:

https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/

--- SW


[no subject]

2022-02-24 Thread Luca Borin
Unsubscribe


[no subject]

2022-01-31 Thread Gaetano Fabiano
Unsubscribe 

Inviato da iPhone

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[no subject]

2022-01-31 Thread pduflot
unsubscribe

 



[no subject]

2021-11-18 Thread Sam Elamin
unsubscribe


[no subject]

2021-11-17 Thread 马殿军
unsubscribe





[no subject]

2021-11-17 Thread Fred Wang
unsubscribe


[no subject]

2021-11-12 Thread 河合亮 / KAWAI,RYOU
unsubscribe



[no subject]

2021-05-03 Thread Tianchen Zhang
Hi all,

Currently the user-facing Catalog API doesn't support backup/restore
metadata. Our customers are asking for such functionalities. Here is a
usage example:
1. Read all metadata of one Spark cluster
2. Save them into a Parquet file on DFS
3. Read the Parquet file and restore all metadata in another Spark cluster

>From the current implementation, Catalog API has the list methods
(listDatabases, listFunctions, etc.) but they don't return enough
information in order to restore an entity (for example, listDatabases lose
"properties" of the database and we need "describe database extended" to
get them). And it only supports createTable (not any other entity
creations). The only way we can backup/restore an entity is using Spark SQL.

We want to introduce the backup and restore from an API level. We are
thinking of doing this simply by adding backup() and restore() in
CatalogImpl, as ExternalCatalog already includes all the methods we need to
retrieve and recreate entities. We are wondering if there is any concern or
drawback of this approach. Please advise.

Thank you in advance,
Tianchen


[no subject]

2021-03-07 Thread Sandeep Varma
Unsubscribe

Sandeep Varma
Principal
ZS Associates India Pvt. Ltd.
World Trade Center, Tower 3, Kharadi, Pune 411014, Maharashtra, India
T  |  +91 20 6739 5224  M  |  +91 97 6633 0103
www.zs.com

ZS  Impact where it matters.








Notice: This message, including attachments, may be confidential or privileged. 
If you are not the addressee, notify the sender immediately and delete this 
email from your system.



[no subject]

2020-12-16 Thread 张洪斌


Unsubscribe
发自网易邮箱大师

[no subject]

2020-05-24 Thread Vijaya Phanindra Sarma B



[no subject]

2020-04-28 Thread Zeming Yu
Unsubscribe

Get Outlook for Android



[no subject]

2020-03-30 Thread Dima Pavlyshyn
Hello Apache Spark Support Team,
I am writing Spark on Java now. I use Dataset API and I face with an issue,
that  I am doing something like that

public  Dataset> groupByKey(Dataset> consumers, Class kClass) {

consumers.groupBy("_1").agg(collect_list(col("_2"))).printSchema();
return 
consumers.groupBy("_1").agg(collect_list(col("_2"))).as(Encoders.tuple(Encoders.bean(kClass),
Encoders.bean(List.class)));
}

And I faced the issue that I can not deserialize collect_list part.
https://spark.apache.org/docs/latest/sql-reference.html#data-types  -
mapping ArrayType to java.util.List
Could you please give me any suggestions, wasted too much time trying to
fix it?
Best Regards,
Dmytro


[no subject]

2020-03-02 Thread lucas.wu


[no subject]

2020-03-01 Thread Hamish Whittal
Hi there,

I have an hdfs directory with thousands of files. It seems that some of
them - and I don't know which ones - have a problem with their schema and
it's causing my Spark application to fail with this error:

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet
column cannot be converted in file hdfs://
ip-172-24-89-229.blaah.com:8020/user/hadoop/origdata/part-0-8b83989a-e387-4f64-8ac5-22b16770095e-c000.snappy.parquet.
Column: [price], Expected: double, Found: FIXED_LEN_BYTE_ARRAY

The problem is not only that it's causing the application to fail, but
every time if does fail, I have to copy that file out of the directory and
start the app again.

I thought of trying to use try-except, but I can't seem to get that to work.

Is there any advice anyone can give me because I really can't see myself
going through thousands of files trying to figure out which ones are broken.

Thanks in advance,

hamish


[no subject]

2020-01-14 Thread @Sanjiv Singh
Regards
Sanjiv Singh
Mob :  +1 571-599-5236


[no subject]

2020-01-14 Thread @Sanjiv Singh
Regards
Sanjiv Singh
Mob :  +1 571-599-5236


[no subject]

2019-09-19 Thread Georg Heiler
Hi,

How can I create an initial state by hands so that structured streaming
files source only reads data which is semantically (i.e. using a file path
lexicographically) greater than the minimum committed initial state?

Details here:
https://stackoverflow.com/questions/58004832/spark-structured-streaming-file-source-read-from-a-certain-partition-onwards

Best,
Georg


[no subject]

2019-07-21 Thread Hieu Nguyen
Hi Spark communities,

I just found out that in
https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.fullOuterJoin,
the documentation is "Perform a right outer join of self and other." It
should be a full outer join, not a right outer join, as shown in the
example and the method name.

Please check it out. Thanks!

Yours sincerely,

Hieu Nguyen


[no subject]

2019-06-06 Thread Shi Tyshanchn



[no subject]

2019-04-08 Thread Siddharth Reddy
unsubscribe


[no subject]

2019-03-29 Thread Daniel Sierra
unsubscribe


[no subject]

2019-03-13 Thread Anbazhagan Muthuramalingam
SUBSCRIBE


[no subject]

2019-03-05 Thread Shyam P
Hi All,
  I need to save a huge data frame as parquet file. As it is huge its
taking several hours. To improve performance it is known I have to send it
group wise.

But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling
a lot of data and performance hits a lot again.

So how to handle this situation and save one group after another.

Attaching the sample scenario of the same.

https://stackoverflow.com/questions/54416623/how-to-group-dataframe-year-wise-and-iterate-through-groups-and-send-each-year-d

Highly appreciate your help.

Thanks,
Shyam


[no subject]

2019-02-13 Thread Kumar sp



[no subject]

2019-01-31 Thread Ahmed Abdulla
unsubscribe


[no subject]

2019-01-31 Thread Daniel O' Shaughnessy
unsubscribe


[no subject]

2019-01-30 Thread Daniel O' Shaughnessy
Unsubscribe


[no subject]

2018-12-19 Thread Daniel O' Shaughnessy
unsubscribe


[no subject]

2018-11-08 Thread JF Chen
I am working on a spark streaming application, and I want it to read
configuration from mongodb every hour, where the batch interval is 10
minutes.
Is it practicable? As I know spark streaming batch are related to the
Dstream, how to implement this function which seems not related to dstream
data?


Regard,
Junfeng Chen


[no subject]

2018-05-16 Thread Davide Brambilla
Hi all,
   we have a dataframe with 1000 partitions and we need to write the
dataframe into a MySQL using this command:

df.coalesce(20)
df.write.jdbc(url=url,
  table=table,
  mode=mode,
  properties=properties)

and we get this errors randomly

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at
org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:125)
at
org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:124)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:126)
at
org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:520)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:693)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:753)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1690)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1678)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1677)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1677)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:855)
at scala.Option.foreach(Option.scala:257)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:855)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1905)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1860)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1849)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:671)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:446)
at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at 

[no subject]

2018-05-02 Thread Filippo Balicchia



[no subject]

2017-10-22 Thread 梁义怀


[no subject]

2017-09-08 Thread PICARD Damien
Hi !

I'm facing a Classloader problem using Spark 1.5.1

I use javax.validation and hibernate validation annotations on some of my beans 
:

  @NotBlank
  @Valid
  private String attribute1 ;

  @Valid
  private String attribute2 ;

When Spark tries to unmarshall these beans (after a remote RDD), I get the 
ClassNotFoundException :
17/09/07 09:19:25 INFO storage.BlockManager: Found block rdd_8_1 remotely
17/09/07 09:19:25 ERROR executor.Executor: Exception in task 3.0 in stage 2.0 
(TID 6)
java.lang.ClassNotFoundException: org.hibernate.validator.constraints.NotBlank
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:700)
at java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1566)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
   ...

Indeed, it means that the annotation class is not found, because it is not in 
the classpath. Why ? I don't know, because I make a uber JAR that contains this 
class. I suppose that at the time the job tries to unmarshall the RDD, the uber 
jar is not loaded.

So, I try to add the hibernate JAR to the class loader manually, using this 
spark-submit command :

spark-submit --queue default \
--class com.my.Launcher \
--deploy-mode cluster \
--master yarn-cluster \
--driver-java-options "-Dfile.encoding=UTF-8" \
--jars /home/user/hibernate-validator-5.2.2.Final.jar \
--driver-class-path /home/user/hibernate-validator-5.2.2.Final.jar \
--conf 
"spark.executor.extraClassPath=/home/user/hibernate-validator-5.2.2.Final.jar" \
/home/user/uberjar-job.jar

Without effects. So, is there a way to add this class to the classloader ?

Thank you in advance.

Damien
=

Ce message et toutes les pieces jointes (ci-apres le "message")
sont confidentiels et susceptibles de contenir des informations
couvertes par le secret professionnel. Ce message est etabli
a l'intention exclusive de ses destinataires. Toute utilisation
ou diffusion non autorisee interdite.
Tout message electronique est susceptible d'alteration. La SOCIETE GENERALE
et ses filiales declinent toute responsabilite au titre de ce message
s'il a ete altere, deforme falsifie.

=

This message and any attachments (the "message") are confidential,
intended solely for the addresses, and may contain legally privileged
information. Any unauthorized use or dissemination is prohibited.
E-mails are susceptible to alteration. Neither SOCIETE GENERALE nor any
of its subsidiaries or affiliates shall be liable for the message
if altered, changed or falsified.

=


[no subject]

2017-08-07 Thread Sumit Saraswat
Unsubscribe


[no subject]

2017-06-07 Thread Patrik Medvedev
Hello guys,

I need to execute hive queries on remote hive server from spark, but for
some reasons i receive only column names(without data).
Data available in table, i checked it via HUE and java jdbc connection.

Here is my code example:
val test = spark.read
.option("url", "jdbc:hive2://remote.hive.server:1/work_base")
.option("user", "user")
.option("password", "password")
.option("dbtable", "some_table_with_data")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.format("jdbc")
.load()
test.show()


Scala version: 2.11
Spark version: 2.1.0, i also tried 2.1.1
Hive version: CDH 5.7 Hive 1.1.1
Hive JDBC version: 1.1.1

But this problem available on Hive with later versions, too.
Could you help me with this issue, because i didn't find anything in mail
group answers and StackOverflow.

-- 
*Cheers,*
*Patrick*


[no subject]

2017-05-26 Thread Anton Kravchenko
df.rdd.foreachPartition(convert_to_sas_single_partition)

def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = {

for (irow <- ipartition) {


[no subject]

2017-03-09 Thread sathyanarayanan mudhaliyar
I am using spark streaming for a basic streaming movie count program.
So I first I have mapped the year and movie name to a JavaPairRDD and
I am using the reduceByKey cor counting the movie year wise.

I am using cassandra for output, the spark streaming application is not
stopping and the cassandra is also not showing any output. I think the data
should be staging, If so how do I stop my spark streaming application

regards,
Sathya


[no subject]

2017-03-08 Thread sathyanarayanan mudhaliyar
 code:
directKafkaStream.foreachRDD(rdd ->
{
rdd.foreach(record ->
{
messages1.add(record._2);
});
JavaRDD lines = sc.parallelize(messages1);
JavaPairRDD data = lines.mapToPair(new
PairFunction()
{
@Override
public Tuple2 call(String a)
{
String[] tokens = StringUtil.split(a, '%');
return new Tuple2(Integer.getInteger(tokens[3]),tokens[2]);
}
});
Function2 reduceSumFunc =
(accum, n) -> (accum.concat(n));
JavaPairRDD yearCount =
data.reduceByKey(reduceSumFunc);

javaFunctions(yearCount).writerBuilder("movie_keyspace","movie_count",mapTupleToRow(Integer.class,String.class))

.withColumnSelector(someColumns("year","list_of_movies")).saveToCassandra();//
this is the error line
});




--


error:
com.datastax.spark.connector.writer.NullKeyColumnException: Invalid
null value for key column year
at com.datastax.spark.connector.writer.RoutingKeyGenerator$$
anonfun$fillRoutingKey$1.apply$mcVI$sp(RoutingKeyGenerator.scala:49)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.
scala:160)
at com.datastax.spark.connector.writer.RoutingKeyGenerator.
fillRoutingKey(RoutingKeyGenerator.scala:47)
at com.datastax.spark.connector.writer.RoutingKeyGenerator.
apply(RoutingKeyGenerator.scala:56)
at com.datastax.spark.connector.writer.TableWriter.
batchRoutingKey(TableWriter.scala:126)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$
write$1$$anonfun$19.apply(TableWriter.scala:151)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$
write$1$$anonfun$19.apply(TableWriter.scala:151)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.
next(GroupingBatchBuilder.scala:107)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.
next(GroupingBatchBuilder.scala:31)
at scala.collection.Iterator$class.foreach(Iterator.scala:
893)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.
foreach(GroupingBatchBuilder.scala:31)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$
write$1.apply(TableWriter.scala:158)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$
write$1.apply(TableWriter.scala:135)
at com.datastax.spark.connector.cql.CassandraConnector$$
anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
at com.datastax.spark.connector.cql.CassandraConnector$$
anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.
closeResourceAfterUse(CassandraConnector.scala:140)
at com.datastax.spark.connector.cql.CassandraConnector.
withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.writer.TableWriter.write(
TableWriter.scala:135)
at com.datastax.spark.connector.RDDFunctions$$anonfun$
saveToCassandra$1.apply(RDDFunctions.scala:37)
at com.datastax.spark.connector.RDDFunctions$$anonfun$
saveToCassandra$1.apply(RDDFunctions.scala:37)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(
Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


--

Trying to connect Kafka and cassandra using spark
Able to store a JavaRDD but not able to store a JavaPairRDD into
cassandra
I have given comment in the line where the error is


[no subject]

2017-03-08 Thread sathyanarayanan mudhaliyar
code:
directKafkaStream.foreachRDD(rdd ->
{
rdd.foreach(record ->
{
messages1.add(record._2);
});
JavaRDD lines = sc.parallelize(messages1);
JavaPairRDD data =
lines.mapToPair(new PairFunction()
{
@Override
public Tuple2 call(String a)
{
String[] tokens = StringUtil.split(a, '%');
return new Tuple2(Integer.getInteger(tokens[3]),tokens[2]);
}
});
Function2 reduceSumFunc =
(accum, n) -> (accum.concat(n));
JavaPairRDD yearCount =
data.reduceByKey(reduceSumFunc);

javaFunctions(yearCount).writerBuilder("movie_keyspace",
"movie_count", mapTupleToRow(Integer.class,
String.class)).withColumnSelector(someColumns("year","list_of_movies")).saveToCassandra();
// this is the error line
});




--


error:
com.datastax.spark.connector.writer.NullKeyColumnException:
Invalid null value for key column year
at 
com.datastax.spark.connector.writer.RoutingKeyGenerator$$anonfun$fillRoutingKey$1.apply$mcVI$sp(RoutingKeyGenerator.scala:49)
at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
com.datastax.spark.connector.writer.RoutingKeyGenerator.fillRoutingKey(RoutingKeyGenerator.scala:47)
at 
com.datastax.spark.connector.writer.RoutingKeyGenerator.apply(RoutingKeyGenerator.scala:56)
at 
com.datastax.spark.connector.writer.TableWriter.batchRoutingKey(TableWriter.scala:126)
at 
com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1$$anonfun$19.apply(TableWriter.scala:151)
at 
com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1$$anonfun$19.apply(TableWriter.scala:151)
at 
com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:107)
at 
com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:31)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31)
at 
com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:158)
at 
com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:135)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
at 
com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
at 
com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
at 
com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at 
com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:135)
at 
com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
at 
com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


--

Trying to connect Kafka and cassandra using spark
Able to store a JavaRDD but not able to store a JavaPairRDD into cassandra
I have given comment in the line where the error is
Thank you

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[no subject]

2016-12-20 Thread satyajit vegesna
Hi All,

PFB sample code ,

val df = spark.read.parquet()
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd


def comp(zipcode:String):Unit={

val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode)
val data = spark.sql(zipval) //Throwing null pointer exception with RDD
data.write.parquet(..)

}

val sam = zip.map(x => comp(x))
sam.count

But when i do val zip =
df.select("zip_code").distinct().as[String].rdd.collect and call the
function, then i get data computer, but in sequential order.

I would like to know, why when tried running map with rdd, i get null
pointer exception and is there a way to compute the comp function for each
zipcode in parallel ie run multiple zipcode at the same time.

Any clue or inputs are appreciated.

Regards.


[no subject]

2016-12-06 Thread ayan guha
Hi

We are generating some big model objects


> hdfs dfs -du -h /myfolder
325  975  /myfolder/__ORCHMETA__
1.7 M5.0 M/myfolder/model
185.3 K  555.9 K  /myfolder/predict

The issue I am facing while loading is

Error in .jcall("com/oracle/obx/df/OBXSerializer", returnSig =
"Ljava/lang/Object;",  :
  java.io.StreamCorruptedException: invalid type code: 00
Calls: orch.load.model ->  -> .jcall -> .jcheck -> .Call
Execution halted


As you can guess, it is from Oracle's R for Hadoop (ORAAH) product. I am
going to raise a ticket with Oracle, but thought to check here as well if
there are any solution available.



-- 
Best Regards,
Ayan Guha


[no subject]

2016-11-28 Thread Didac Gil
Any suggestions for using something like OneHotEncoder and StringIndexer on
an InputDStream?

I could try to combine an Indexer based on a static parquet but I want to
use the OneHotEncoder approach in Streaming data coming from a socket.

Thanks!

Dídac Gil de la Iglesia


[no subject]

2016-11-24 Thread Rostyslav Sotnychenko


[no subject]

2016-10-10 Thread Fei Hu
Hi All,

I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use
org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the
dependency I added Kyro-3.0.3 as following:
 com.esotericsoftware:kryo:3.0.3


When I wrote the scala notebook and run the program, I got the following
errors. But If I compiled these code as jars, and use spark-submit to run
it on the cluster, it worked well without errors.

WARN [2016-10-10 23:43:40,801] ({task-result-getter-1}
Logging.scala[logWarning]:71) - Lost task 0.0 in stage 3.0 (TID 9,
svr-A3-A-U20): java.io.EOFException

at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:196)

at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:217)

at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1175)

at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)

at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)

at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)

at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)

at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:88)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


There were also some errors when I run the Zeppelin Tutorial:

Caused by: java.io.IOException: java.lang.NullPointerException

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)

at
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)

at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)

at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)

at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)

at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)

... 3 more

Caused by: java.lang.NullPointerException

at
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:38)

at
com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:23)

at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1$$anonfun$apply$mcV$sp$2.apply(ParallelCollectionRDD.scala:80)

at
org.apache.spark.util.Utils$.deserializeViaNestedStream(Utils.scala:142)

at
org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:80)

at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160)

Is there anyone knowing why it happended?

Thanks in advance,
Fei


[no subject]

2016-10-06 Thread ayan guha
Hi

Faced one issue:

- Writing Hive Partitioned table using

df.withColumn("partition_date",to_date(df["INTERVAL_DATE"])).write.partitionBy('partition_date').saveAsTable("sometable",mode="overwrite")

- Data got written to HDFS fine. I can see the folders with partition names
such as

/app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
/app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29

and so on.
- Also, _common_metadata & _metadata files are written properly

- I can read data from spark fine using
read.parquet("/app/somedb/hive/somedb.db/sometable"). Printschema showing
all columns.

- However, I can not read from hive.

Problem 1: Hive does not think the table is partitioned
Problem 2: Hive sees only 1 column
array from deserializer
Problem 3: MSCK repair table failed, saying partitions are not in Metadata.

Question: Is it a known issue with Spark to write to Hive partitioned table?


-- 
Best Regards,
Ayan Guha


[no subject]

2016-08-14 Thread Jestin Ma
Hi, I'm currently trying to perform an outer join between two
DataFrames/Sets, one is ~150GB, one is about ~50 GB on a column, id.

df1.id is skewed in that there are many 0's, the rest being unique IDs.

df2.id is not skewed. If I filter df1.id != 0, then the join works well. If
I don't, then the join does not complete for a very, very long time.

I have diagnosed this problem due to the hashpartitioning on IDs, resulting
in one partition containing many values due to data skew. One executor ends
up reading most of the shuffle data, and writing all of the shuffle data,
as shown below.





Shown above is the task in question assigned to one executor.



This screenshot comes from one of the executors, showing one single thread
spilling sort data since the executor cannot hold 90%+ of the ~200 GB
result in memory.

Moreover, looking at the event timeline, I find that the executor on that
task spends about 20% time reading shuffle data, 70% computation, and 10%
writing output data.

I have tried the following:


   - "Salting" the 0-value keys by monotonically_increasing_id().mod(N)
   - - This doesn't seem to have an effect since now I have
   hundreds/thousands of keys with tens of thousands of occurrences.
   - - Should I increase N? Is there a way to just do random.mod(N) instead
   of monotonically_increasing_id()?
   -
   - Repartitioning according to column I know contains unique values
   -
   - - This is overridden by Spark's sort-based shuffle manager which hash
   repartitions on the skewed column
   -
   - - Is it possible to change this? Or will the join column need to be
   hashed and partitioned on for joins to work
   -
   - Broadcasting does not work for my large tables
   -
   - Increasing/decreasing spark.sql.shuffle.partitions does not remedy the
   skewed data problem as 0-product values are still being hashed to the same
   partition.


--

What I am considering currently is doing the join at the RDD level, but is
there any level of control which can solve my skewed data problem? Other
than that, see the bolded question.

I would appreciate any suggestions/tips/experience with this. Thank you!


[no subject]

2016-07-08 Thread tan shai
Hi,

Can any one explain to me the class RangePartitioning "
https://github.com/apache/spark/blob/d5911d1173fe0872f21cae6c47abf8ff479345a4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
"

case class RangePartitioning(ordering: Seq[SortOrder], numPartitions: Int)
extends Expression with Partitioning with Unevaluable {
override def children: Seq[SortOrder] = ordering
override def nullable: Boolean = false
override def dataType: DataType = IntegerType
override def satisfies(required: Distribution): Boolean = required match {
case UnspecifiedDistribution => true
case OrderedDistribution(requiredOrdering) =>
val minSize = Seq(requiredOrdering.size, ordering.size).min
requiredOrdering.take(minSize) == ordering.take(minSize)
case ClusteredDistribution(requiredClustering) =>
ordering.map(_.child).forall(x =>
requiredClustering.exists(_.semanticEquals(x)))
case _ => false
}
override def compatibleWith(other: Partitioning): Boolean = other match {
case o: RangePartitioning => this.semanticEquals(o)
case _ => false
}
override def guarantees(other: Partitioning): Boolean = other match {
case o: RangePartitioning => this.semanticEquals(o)
case _ => false
}
}


[no subject]

2016-06-29 Thread pooja mehta
Hi,

Want to add a metadata field to StructField case class in spark.

case class StructField(name: String)

And how to carry over the metadata in query execution.


[no subject]

2016-06-24 Thread Rama Perubotla
Unsubscribe


[no subject]

2016-06-10 Thread pooja mehta
Hi,

How to use scala UDF with the help of Beeline client.
With the help of spark shell, we register our UDF like this:-
sqlcontext.udf.register().
What is the way to use UDF in beeline client.

Thanks
Pooja


[no subject]

2016-04-11 Thread Angel Angel
Hello,

I am writing the one spark application, it runs well but takes long
execution time can anyone help me to optimize my query to increase the
processing speed.


I am writing one application in which i have to construct the histogram and
compare the histograms in order to find the final candidate.


My code in which i read the text file and matches the first field and
subtract the second fild from the matched candidates and update the table.

Here is my code, Please help me to optimize it.


val sqlContext = new org.apache.spark.sql.SQLContext(sc)


import sqlContext.implicits._


val Array_Ele =
sc.textFile("/root/Desktop/database_200/patch_time_All_20_modified_1.txt").flatMap(line=>line.split("
")).take(900)


val df1=
sqlContext.read.parquet("hdfs://hadoopm0:8020/tmp/input1/database_modified_No_name_400.parquet")


var k = df1.filter(df1("Address").equalTo(Array_Ele(0) ))

var a= 0


for( a <-2 until 900 by 2){

k=k.unionAll(
df1.filter(df1("Address").equalTo(Array_Ele(a))).select(df1("Address"),df1("Couple_time")-Array_Ele(a+1),df1("WT_ID")))}


k.cache()


val WT_ID_Sort  = k.groupBy("WT_ID").count().sort(desc("count"))


val temp = WT_ID_Sort.select("WT_ID").rdd.map(r=>r(0)).take(10)


val Table0=
k.filter(k("WT_ID").equalTo(temp(0))).groupBy("Couple_time").count().select(max($"count")).show()

val Table1=
k.filter(k("WT_ID").equalTo(temp(1))).groupBy("Couple_time").count().select(max($"count")).show()

val Table2=
k.filter(k("WT_ID").equalTo(temp(2))).groupBy("Couple_time").count().select(max($"count")).show()

val Table3=
k.filter(k("WT_ID").equalTo(temp(3))).groupBy("Couple_time").count().select(max($"count")).show()

val Table4=
k.filter(k("WT_ID").equalTo(temp(4))).groupBy("Couple_time").count().select(max($"count")).show()

val Table5=
k.filter(k("WT_ID").equalTo(temp(5))).groupBy("Couple_time").count().select(max($"count")).show()

val Table6=
k.filter(k("WT_ID").equalTo(temp(6))).groupBy("Couple_time").count().select(max($"count")).show()

val Table7=
k.filter(k("WT_ID").equalTo(temp(7))).groupBy("Couple_time").count().select(max($"count")).show()

val Table8=
k.filter(k("WT_ID").equalTo(temp(8))).groupBy("Couple_time").count().select(max($"count")).show()



val Table10=
k.filter(k("WT_ID").equalTo(temp(10))).groupBy("Couple_time").count().select(max($"count")).show()


val Table11=
k.filter(k("WT_ID").equalTo(temp(11))).groupBy("Couple_time").count().select(max($"count")).show()


and last one how can i compare the all this tables to find the maximum
value.




Thanks,


[no subject]

2016-04-02 Thread Hemalatha A
Hello,

As per Spark programming guide, it says "we should have 2-4 partitions for
each CPU in your cluster.". In this case how does 1 CPU core process 2-4
partitions at the same time?
Link - http://spark.apache.org/docs/latest/programming-guide.html (under
Rdd section)

Does it do context switching between tasks or run them in parallel? If it
does context switching how is it efficient compared to 1:1 partition vs
Core?

PS: If we are using Kafka direct API  in which kafka partitions=  Rdd
partitions. Does that mean we should give 40 kafka partitions for 10 CPU
Cores?

-- 


Regards
Hemalatha


Spark thrift issue 8659 (changing subject)

2016-03-23 Thread ayan guha
>
> Hi All
>
> I found this issue listed in Spark Jira -
> https://issues.apache.org/jira/browse/SPARK-8659
>
> I would love to know if there are any roadmap for this? Maybe someone from
> dev group can confirm?
>
> Thank you in advance
>
> Best
> Ayan
>
>


[no subject]

2016-03-20 Thread Vinay Varma



[no subject]

2016-01-11 Thread Daniel Imberman
Hi all,

I'm looking for a way to efficiently partition an RDD, but allow the same
data to exists on multiple partitions.


Lets say I have a key-value RDD with keys {1,2,3,4}

I want to be able to to repartition the RDD so that so the partitions look
like

p1 = {1,2}
p2 = {2,3}
p3 = {3,4}

Locality is important in this situation as I would be doing internal
comparison values.

Does anyone have any thoughts as to how I could go about doing this?

Thank you


[no subject]

2016-01-08 Thread Suresh Thalamati



[no subject]

2015-12-04 Thread Sateesh Karuturi
user-sc.1449231970.fbaoamghkloiongfhbbg-sateesh.karuturi9=
gmail@spark.apache.org


[no subject]

2015-11-26 Thread Dmitry Tolpeko



[no subject]

2015-11-19 Thread aman solanki
Hi All,

I want to know how one can get historical data of jobs,stages,tasks etc of
a running spark application.

Please share the information regarding the same.

Thanks,
Aman Solanki


[no subject]

2015-10-15 Thread Lei Wu
Dear all,

Like the design doc in SPARK-1 for Spark memory management, is there a
design doc for Spark task scheduling details ? I'd really like to dive deep
into the task scheduling module of Spark, thanks so much !


[no subject]

2015-10-15 Thread Anfernee Xu
Sorry, I have to re-send it again as I did not get the answer.

Here's the problem I'm facing, I have a standalone java application which
is periodically submit Spark jobs to my yarn cluster, btw I'm not using
'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs
are successful and I can see them on Yarn RM webUI, but when I want to
follow the link to the app history on Spark historyserver, I always got
404(application is not found) from Spark historyserver.

My code looks likes as below


SparkConf conf = new
SparkConf().setAppName("testSpak").setMaster("yarn-client")
.setJars(new String[]{IOUtil.getJar(MySparkApp.class)});

conf.set("spark.yarn.historyServer.address", "10.247.44.155:18080");
conf.set("spark.history.fs.logDirectory",
"
hdfs://myHdfsNameNode:55310/scratch/tie/spark/applicationHistory");

JavaSparkContext sc = new JavaSparkContext(conf);

try {

 ... my application code

}finally{
  sc.stop();

}

Anything I did wrong or missed? Do I need to configure something on Yarn
side?

Thanks

-- 
--Anfernee


[no subject]

2015-07-10 Thread satish chandra j
HI All,
I have issues to make external jar available to Spark Shell
I have used -jars options while starting Spark Shell to make these
available
when I give command Class.forName(org.postgresql.Driver it is not giving
any error
But when action operation is performed on RDD than I am getting typical No
suitable driver found for jdbc:postgresql://

Please provide solution if anybody has faced and fixed the same

Regards,
Satish Chandra


[no subject]

2015-07-07 Thread Anand Nalya
Hi,

Suppose I have an RDD that is loaded from some file and then I also have a
DStream that has data coming from some stream. I want to keep union some of
the tuples from the DStream into my RDD. For this I can use something like
this:

  var myRDD: RDD[(String, Long)] = sc.fromText...
  dstream.foreachRDD{ rdd =
myRDD = myRDD.union(rdd.filter(myfilter))
  }

My questions is that for how long spark will keep RDDs underlying the
dstream around? Is there some configuratoin knob that can control that?

Regards,
Anand


[no subject]

2015-07-07 Thread 付雅丹
Hi, everyone!

I've got key,value pair in form of LongWritable, Text, where I used the
following code:

SparkConf conf = new SparkConf().setAppName(MapReduceFileInput);
JavaSparkContext sc = new JavaSparkContext(conf);
Configuration confHadoop = new Configuration();

JavaPairRDDLongWritable,Text sourceFile=sc.newAPIHadoopFile(
hdfs://cMaster:9000/wcinput/data.txt,
DataInputFormat.class,LongWritable.class,Text.class,confHadoop);

Now I want to handle the javapairrdd data from LongWritable, Text to
another LongWritable, Text, where the Text content is different. After
that, I want to write Text into hdfs in order of LongWritable value. But I
don't know how to write mapreduce function in spark using java language.
Someone can help me?


Sincerely,
Missie.


[no subject]

2015-06-23 Thread ๏̯͡๏
I have a Spark job that has 7 stages. The first 3 stage complete and the
fourth stage beings (joins two RDDs). This stage has multiple task
 failures all the below exception.

Multiple tasks (100s) of them get the same exception with different hosts.
How can all the host suddenly stop responding when few moments ago 3 stages
ran successfully. If I re-run the three stages will again run successfully.
I cannot think of it being a cluster issue.


Any suggestions ?


Spark Version : 1.3.1

Exception:

org.apache.spark.shuffle.FetchFailedException: Failed to connect to HOST
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:125)
at org.apache.sp


-- 
Deepak


[no subject]

2015-06-17 Thread Nipun Arora
Hi,

Is there anyway in spark streaming to keep data across multiple
micro-batches? Like in a HashMap or something?
Can anyone make suggestions on how to keep data across iterations where
each iteration is an RDD being processed in JavaDStream?

This is especially the case when I am trying to update a model or compare
two sets of RDD's, or keep a global history of certain events etc which
will impact operations in future iterations?
I would like to keep some accumulated history to make calculations.. not
the entire dataset, but persist certain events which can be used in future
JavaDStream RDDs?

Thanks
Nipun


[no subject]

2015-06-11 Thread Wangfei (X)
Hi, all

   We use spark sql to insert data from a text table into a partitioning table 
and found that if we give more cores to executors the insert performance whold 
be worse.



executors numtotal-executor-cores  average time for 
insert task

3   3 1.7 
min

3   6 3 min

3   9 4.8 
min





any one has idea?



following is the detail info of out test.





cluster:
4 nodes, each node 300+ g memory and 24 cores.
hadoop: 1 namenode + 3 datanodes.
spark: standalone mode, 1 master + 3 workers.



spark version:

apache master branch,  the current commit is

33mcommit a777eb04bf981312b640326607158f78dd4163cd

Author: Patrick Wendell patr...@databricks.commailto:patr...@databricks.com
Date:   Wed Jun 10 21:13:47 2015 -0700

[HOTFIX] Adding more contributor name bindings


sql:

1 create a external table like

CREATE EXTERNAL TABLE tableA (
   f1 string,
   f2 string,
   f3 bigint,
   f4 smallint,
   f5 smallint,
   f6 string,
   f7 smallint,
   f8 string,
   f9 string,
   f10 smallint,
   f11 string,
   f12 bigint,
   f13 bigint,
   f14 bigint,
   f15 bigint,
   f16 string,
   f17 string,
   f18 smallint,
   f19 string,
   f20 string,
   f21 string,
   f22 string,
   f23 string,
   f24 smallint,
   f25 smallint,
   f26 bigint,
   f27 smallint,
   f28 bigint,
   f29 string,
   f30 bigint,
   f31 bigint,
   f32 bigint,
   f33 smallint,
   f34 smallint,
   f35 smallint,
   f36 smallint,
   f37 smallint,
   f38 smallint,
   f39 string,
   f40 smallint,
   f41 string,
   f42 smallint,
   f43 string,
   f44 smallint,
   f45 smallint,
   f46 smallint,
   f47 string,
   f48 smallint,
   f49 smallint,
   f50 string,
   f51 string,
   f52 smallint,
   f53 int,
   f54 bigint,
   f55 bigint)
row format delimited fields terminated by '|'
STORED AS textfile
LOCATION '/data';



2 create a patition table



CREATE EXTERNAL TABLE tableB (
   f1 string,
   f2 string,
   f3 bigint,
   f4 smallint,
   f5 smallint,
   f6 string,
   f7 smallint,
   f8 string,
   f9 string,
   f10 smallint,
   f11 string,
   f12 bigint,
   f13 bigint,
   f14 bigint,
   f15 bigint,
   f16 string,
   f17 string,
   f18 smallint,
   f19 string,
   f20 string,
   f21 string,
   f22 string,
   f23 string,
   f24 smallint,
   f25 smallint,
   f26 bigint,
   f27 smallint,
   f28 bigint,
   f29 string,
   f30 bigint,
   f31 bigint,
   f32 bigint,
   f33 smallint,
   f34 smallint,
   f35 smallint,
   f36 smallint,
   f37 smallint,
   f38 smallint,
   f39 string,
   f40 smallint,
   f41 string,
   f42 smallint,
   f43 string,
   f44 smallint,
   f45 smallint,
   f46 smallint,
   f47 string,
   f48 smallint,
   f49 smallint,
   f50 string,
   f51 string,
   f52 smallint,
   f53 int,
   f54 bigint,
   f55 bigint) partitioned by (hour int, id string);



3 do insert



insert into table tableB partition (hour,last_msisdn) select *, 
hour(from_unixtime(f3,'-MM-dd HH:mm:ss')),substr(f9,-1)  from tableA;




[no subject]

2015-05-06 Thread anshu shukla
Exception with sample testing in Intellij IDE:

Exception in thread main java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class
at akka.util.Collections$EmptyImmutableSeq$.init(Collections.scala:15)
at akka.util.Collections$EmptyImmutableSeq$.clinit(Collections.scala)
at akka.japi.Util$.immutableSeq(JavaAPI.scala:229)
at akka.remote.RemoteSettings.init(RemoteSettings.scala:30)
at
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:114)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.init(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1837)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1828)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.init(SparkContext.scala:272)
*at Testspark$.main(Testspark.scala:17)*
at Testspark.main(Testspark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException:
scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 38 more



*Code is Testspark.scala-*

/**
 * Created by anshushukla on 07/05/15.
 */
import org.apache.spark.{SparkConf, SparkContext}


object Testspark {


  def main (args: Array[String]) {
val conf=new SparkConf()
.setMaster(local[2])
.setAppName(TestSpark)

val sc=new SparkContext(conf)//line number 17 showing exception

val data=sc.parallelize(1 to 10).collect().filter(_1000)
data.foreach(println)

  }

}


*build.sbt is -*

name := scala-test-workspace

version := 1.0

scalaVersion := 2.11.6

libraryDependencies += org.apache.spark % spark-streaming_2.10 % 1.3.1


-- 
Thanks  Regards,
Anshu Shukla
Indian Institute of Science


[no subject]

2015-03-25 Thread Himanish Kushary
Hi,

I have a RDD of pairs of strings like below :

(A,B)
(B,C)
(C,D)
(A,D)
(E,F)
(B,F)

I need to transform/filter this into a RDD of pairs that does not repeat a
string once it has been used once. So something like ,

(A,B)
(C,D)
(E,F)

(B,C) is out because B has already ben used in (A,B), (A,D) is out because
A (and D) has been used etc.

I was thinking of a option of using a shared variable to keep track of what
has already been used but that may only work for a single partition and
would not scale for larger dataset.

Is there any other efficient way to accomplish this ?

-- 
Thanks  Regards
Himanish


[no subject]

2015-03-23 Thread Udbhav Agarwal
Hi,
I am querying hbase via Spark SQL with java APIs.
Step -1
creating
JavaPairRdd, then JavaRdd, then JavaSchemaRdd.applySchema objects.
Step -2
sqlContext.sql(sql query).

If am updating my hbase database between these two steps(by hbase shell in some 
other console) the query in step two is not picking the updated data from the 
table. Its showing the old data. Can somebody tell how to let spark know I have 
updated my database after spark has created Rdds.




Thanks,
Udbhav Agarwal



[no subject]

2015-03-16 Thread Hector




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[no subject]

2015-03-03 Thread shahab
 I did an experiment with Hive and SQL context , I queried Cassandra
using CassandraAwareSQLContext
(a custom SQL context from Calliope) , then I registered the rdd as a
temp table , next I tried to query it using HiveContext, but it seems that
hive context can not see the registered table suing SQL context. Is this a
normal case?

Stack trace:

 ERROR hive.ql.metadata.Hive -
NoSuchObjectException(message:default.MyTableName table not found)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(
HiveMetaStore.java:1373)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(
RetryingHMSHandler.java:103)

best,
/Shahab


[no subject]

2015-03-03 Thread Jianshi Huang
Hi,

I got this error message:

15/03/03 10:22:41 ERROR OneForOneBlockFetcher: Failed while starting block
fetches
java.lang.RuntimeException: java.io.FileNotFoundException:
/hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
(No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:146)
at
org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
at
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at
org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)


And then for the same index file and executor, I got the following errors
multiple times

15/03/03 10:22:41 ERROR ShuffleBlockFetcherIterator: Failed to get block(s)
from host-:39534
java.lang.RuntimeException: java.io.FileNotFoundException:
/hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
(No such file or directory)

15/03/03 10:22:41 ERROR RetryingBlockFetcher: Failed to fetch block
shuffle_0_13_1228, and will not retry (0 retries)
java.lang.RuntimeException: java.io.FileNotFoundException:
/hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index
(No such file or directory)

...
Caused by: java.net.ConnectException: Connection refused: host-


What's the problem?

BTW, I'm using Spark 1.2.1-SNAPSHOT I built around Dec. 20. Is there any
bug fixes related to shuffle block fetching or index files after that?


Thanks,
-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github  Blog: http://huangjs.github.com/


[no subject]

2015-02-18 Thread Luca Puggini



[no subject]

2015-01-17 Thread Kyounghyun Park
Hi,

I'm running Spark 1.2 in yarn-client mode. (using Hadoop 2.6.0)
On VirtualBox, I can run  spark-shell --master yarn-client without any
error
However, on a physical machine,  I got the following error.

Does anyone know why this happens?
Any help would be appreciated.


Thanks,
Kyounghyun


15/01/17 19:34:42 INFO netty.NettyBlockTransferService: Server created on
49709
15/01/17 19:34:42 INFO storage.BlockManagerMaster: Trying to register
BlockManager
15/01/17 19:34:42 INFO storage.BlockManagerMasterActor: Registering block
manager janus:49709 with 265.1 MB RAM, BlockManagerId(driver, janus,
49709)
15/01/17 19:34:42 INFO storage.BlockManagerMaster: Registered BlockManager
15/01/17 19:34:47 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkYarnAM@192.168.123.178:60626] has failed,
address is now gated for [5000] ms. Reason is: [Disassociated].
15/01/17 19:34:47 ERROR cluster.YarnClientSchedulerBackend: Yarn
application has already exited with state FINISHED!
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/static,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors/threadDump,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/executors,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/environment,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/rdd,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/storage,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/pool,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/stage,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/stages,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/jobs/job/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/jobs/job,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/jobs/json,null}
15/01/17 19:34:47 INFO handler.ContextHandler: stopped
o.e.j.s.ServletContextHandler{/jobs,null}
15/01/17 19:34:47 INFO ui.SparkUI: Stopped Spark web UI at http://janus:4040
15/01/17 19:34:47 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/01/17 19:34:47 INFO cluster.YarnClientSchedulerBackend: Shutting down
all executors
15/01/17 19:34:47 INFO cluster.YarnClientSchedulerBackend: Asking each
executor to shut down
15/01/17 19:34:47 INFO cluster.YarnClientSchedulerBackend: Stopped
15/01/17 19:34:48 INFO spark.MapOutputTrackerMasterActor:
MapOutputTrackerActor stopped!
15/01/17 19:34:48 INFO storage.MemoryStore: MemoryStore cleared
15/01/17 19:34:48 INFO storage.BlockManager: BlockManager stopped
15/01/17 19:34:48 INFO storage.BlockManagerMaster: BlockManagerMaster
stopped
15/01/17 19:34:48 INFO spark.SparkContext: Successfully stopped SparkContext
15/01/17 19:34:48 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Shutting down remote daemon.
15/01/17 19:34:48 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Remote daemon shut down; proceeding with flushing remote transports.
15/01/17 19:34:48 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Remoting shut down.
15/01/17 19:35:07 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend
is ready for scheduling beginning after waiting
maxRegisteredResourcesWaitingTime: 3(ms)
java.lang.NullPointerException
at org.apache.spark.SparkContext.init(SparkContext.scala:497)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:986)
at 

[no subject]

2015-01-14 Thread Jianguo Li
I am using Spark-1.1.1. When I used sbt test, I ran into the
following exceptions. Any idea how to solve it? Thanks! I think
somebody posted this question before, but no one seemed to have
answered it. Could it be the version of io.netty I put in my
build.sbt? I included an dependency libraryDependencies += io.netty
% netty % 3.6.6.Final in my build.sbt file.

java.lang.NoClassDefFoundError: io/netty/util/TimerTaskat
org.apache.spark.storage.BlockManager.init(BlockManager.scala:72)
  at org.apache.spark.storage.BlockManager.init(BlockManager.scala:168)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)at
org.apache.spark.SparkContext.init(SparkContext.scala:204)
  at 
spark.jobserver.util.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:34)
  at 
spark.jobserver.JobManagerActor.createContextFromConfig(JobManagerActor.scala:255)
  at 
spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:104)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)...


[no subject]

2015-01-10 Thread Krishna Sankar
Guys,

registerTempTable(Employees)

gives me the error

Exception in thread main scala.ScalaReflectionException: class
org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with primordial
classloader with boot classpath
[/Applications/eclipse/plugins/org.scala-lang.scala-library_2.11.4.v20141023-110636-d783face36.jar:/Applications/eclipse/plugins/org.scala-lang.scala-reflect_2.11.4.v20141023-110636-d783face36.jar:/Applications/eclipse/plugins/org.scala-lang.scala-actors_2.11.4.v20141023-110636-d783face36.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre/classes]
not found.


Probably something obvious I am missing.

Everything else works fine, so far.

Any easy fix ?

Cheers

k/


[no subject]

2015-01-03 Thread Sujeevan
Best Regards,

Sujeevan. N


[no subject]

2014-12-04 Thread Subong Kim


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[no subject]

2014-11-26 Thread rapelly kartheek
Hi,
I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers()
definition in the context of blockManagerMaster.askDriverWithReply()
sending a request GetPeers().

1) I couldn't understand what the 'selfIndex' is used for?.

2) Also, I tried modifying the 'peers' array by just eliminating some
blockManagerId's and passed the modified one to the tabulate method. The
application gets executed, but I find that the
blockManagerMaster.askDriverWithReply() recieves the sequence of
blockManagerIds that include the ones I have eliminated previously.

For example,

My original 'peers' array contained 5 blockManagerId's: BlockManagerId(2,
s2, 39997, 0), BlockManagerId(1, s4, 35874, 0),BlockManagerId(3, s1, 33738,
0), BlockManagerId(0, s3, 38207, 0), BlockManagerId(driver, karthik,
34388, 0).

I modified it to peers1 having 3 blockManagerId's : BlockManagerId(2, s2,
39997, 0), BlockManagerId(1, s4, 35874, 0), BlockManagerId(3, s1, 33738, 0).

Then I passed this modified peers1 array for the sequence conversion:

'Array.tabulate[BlockManagerId](size) { i = peers1((selfIndex + i + 1) %
peers1.length) }.toSeq

But, finally when the /storage/blockManagerMaster.askDriverWithReply() gets
the result, it contains the blockManagerIds that I have eliminated
purposely.

Can someone please make me understand how this seq[BlockManagerId] is
constructed?
Thank you!


[no subject]

2014-10-22 Thread Margusja

unsubscribe

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[no subject]

2014-09-30 Thread PENG ZANG
Hi,

We have a cluster setup with spark 1.0.2 running 4 workers and 1 master
with 64G RAM for each. In the sparkContext we specify 32G executor memory.
However, as long as the task running longer than approximate 15 mins, all
the executors are lost just like some sort of timeout no matter if the task
is using up the memory. We tried to increase the spark.akka.timeout,
spark.akka.lookupTimeout, and spark.worker.timeout, but still no luck.
Besides, even we just start a sparkContext and sit there instead of stop
it, it will still error out with the exception below:

[error] o.a.s.s.TaskSchedulerImpl - Lost executor 0 on XXX06: remote Akka
client disassociated
[error] o.a.s.n.ConnectionManager - Corresponding SendingConnection to
ConnectionManagerId(XXX06.local,34307) not found
[error] o.a.s.s.TaskSchedulerImpl - Lost executor 2 on XXX08: remote Akka
client disassociated
[error] o.a.s.s.TaskSchedulerImpl - Lost executor 1 on XXX07: remote Akka
client disassociated
[error] o.a.s.n.SendingConnection - Exception while reading
SendingConnection to ConnectionManagerId(XXX1,56639)
java.nio.channels.ClosedChannelException: null
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
~[na:1.7.0_60]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
~[na:1.7.0_60]
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
~[spark-core_2.10-1.1.0.jar:1.1.0]
at
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
[spark-core_2.10-1.1.0.jar:1.1.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_60]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_60]
[error] o.a.s.n.SendingConnection - Exception while reading
SendingConnection to ConnectionManagerId(XXX08.local,39914)
java.nio.channels.ClosedChannelException: null
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
~[na:1.7.0_60]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
~[na:1.7.0_60]
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
~[spark-core_2.10-1.1.0.jar:1.1.0]
at
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
[spark-core_2.10-1.1.0.jar:1.1.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_60]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_60]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX06:60653]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX06:60653]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@XXX06:60653]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: XXX06/10.40.31.51:60653
]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX06:61000]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX06:61000]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@XXX06:61000]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: XXX06/10.40.31.51:61000
]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX08:52949]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX08:52949]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@XXX08:52949]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: XXX08/10.40.31.53:52949
]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX08:36726]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX08:36726]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@XXX08:36726]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: XXX08/10.40.31.53:36726
]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX07:46516]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX07:46516]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@XXX07:46516]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: XXX07/10.40.31.52:46516
]
[error] a.r.EndpointWriter - AssociationError
[akka.tcp://sparkDriver@losbornev.local:35540] -
[akka.tcp://sparkExecutor@XXX07:48160]: Error [Association failed with
[akka.tcp://sparkExecutor@XXX07:48160]] [
akka.remote.EndpointAssociationException: Association failed with

[no subject]

2014-09-24 Thread Jianshi Huang
One of my big spark program always get stuck at 99% where a few tasks never
finishes.

I debugged it by printing out thread stacktraces, and found there're
workers stuck at parquet.hadoop.ParquetFileReader.readNextRowGroup.

Anyone had similar problem? I'm using Spark 1.1.0 built for HDP2.1. The
parquet files are generated by pig using latest parquet-pig-bundle
v1.6.0rc1.

From Spark 1.1.0's pom.xml, Spark is using parquet v1.4.3, will this be
problematic?

One of the weird behavior is that another program read and sort data read
from the same parquet files and it works fine. The only difference seems
the buggy program uses foreachPartition and the working program uses map.

Here's the full stacktrace:

Executor task launch worker-3
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173)
at
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138)
at
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683)
at
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
at
parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:139)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at
scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:913)
at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
at
scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:969)
at
scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179)
at
com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:167)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:767)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:767)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1103)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1103)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at 

[no subject]

2014-08-20 Thread Cường Phạm



ERROR UserGroupInformation: Can't find user in Subject:

2014-08-11 Thread Dan Foisy
Hi

I've installed Spark on a Windows 7 machine.  I can get the SparkShell up
and running but when running through the simple example in Getting Started,
I get the following error (tried running as administrator as well) - any
ideas?

scala val textFile = sc.textFile(README.md)
14/08/11 08:55:52 WARN SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes
14/08/11 08:55:52 INFO MemoryStore: ensureFreeSpace(34220) called with
curMem=0,  maxMem=322122547
14/08/11 08:55:52 INFO MemoryStore: Block broadcast_0 stored as values to
memory  (estimated size 33.4 KB, free 307.2 MB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
console:12

scala textFile.count()

*14/08/10 08:55:58 ERROR UserGroupInformation: Can't find user in Subject:*
*Principal: NTUserPrincipal: danfoisy*


[no subject]

2014-07-07 Thread Juan Rodríguez Hortalá
Hi all,

I'm writing a Spark Streaming program that uses reduceByKeyAndWindow(), and
when I change the windowsLenght or slidingInterval I get the following
exceptions, running in local mode


14/07/06 13:03:46 ERROR actor.OneForOneStrategy: key not found:
1404677026000 ms
java.util.NoSuchElementException: key not found: 1404677026000 ms
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at
org.apache.spark.streaming.dstream.ReceiverInputDStream.getReceivedBlockInfo(ReceiverInputDStream.scala:77)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:225)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:223)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:223)
at org.apache.spark.streaming.scheduler.JobGenerator.org
$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:165)
at
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$start$1$$anon$1$$anonfun$receive$1.applyOrElse(JobGenerator.scala:76)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

It looks like some issue cleaning up some kind of internal checkpoints, is
there any path I should clean to get rid of these exceptions?

Thanks a lot for your help,

Greetings,

Juan


[no subject]

2014-07-05 Thread Konstantin Kudryavtsev
I faced in very strange behavior of job that I was run on YARN hadoop
cluster. One of stages (map function) was split in 80 tasks, 10 of them
successfully finished in ~2 min, but all other jobs are running  40 min
and still not finished... I suspect they hung on.
Any ideas what's going on and how can it be fixed?


Thank you,
Konstantin Kudryavtsev


[no subject]

2014-07-03 Thread Steven Cox
Folks, I have a program derived from the Kafka streaming wordcount example 
which works fine standalone.


Running on Mesos is not working so well. For starters, I get the error below 
No FileSystem for scheme: hdfs.


I've looked at lots of promising comments on this issue so now I have -

* Every jar under hadoop in my classpath

* Hadoop HDFS and Client in my pom.xml


I find it odd that the app writes checkpoint files to HDFS successfully for a 
couple of cycles then throws this exception. This would suggest the problem is 
not with the syntax of the hdfs URL, for example.


Any thoughts on what I'm missing?


Thanks,


Steve


Mesos : 0.18.2

Spark : 0.9.1



14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)

14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times; aborting 
job

14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job 
140443646 ms.0

org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times 
(most recent failure: Exception failure: java.io.IOException: No FileSystem for 
scheme: hdfs)

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

at scala.Option.foreach(Option.scala:236)

at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)

at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

at akka.actor.ActorCell.invoke(ActorCell.scala:456)

at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)




no subject

2014-05-13 Thread Herman, Matt (CORP)
unsubscribe

--
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


  1   2   >