Cassandra and Spark checkpoints

2014-06-25 Thread toivoa
According to „DataStax Brings Spark To Cassandra“ press realese:
„DataStax has partnered with Databricks, the company founded by the creators
of Apache Spark, to build a supported, open source integration between the
two platforms. The partners expect to have the integration ready by this
summer.“
How far this integration goes?
Fow example is it possible to use Cassandra as distributed checkpoints
storage?
Currently only HDFS is supported?

Thanks
Toivo




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cassandra-and-Spark-checkpoints-tp8254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark 1.0.0 Maven dependencies problems.

2014-06-10 Thread toivoa
Thanks for the hint.

I removed signature info from same jar and JVM is happy now.

But problem remains, several same jar's but different versions, not good.

Spark itself is very, very promising, I am very excited


Thank you all
toivo



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-Maven-dependencies-problems-tp7247p7309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark 1.0.0 Maven dependencies problems.

2014-06-09 Thread toivoa
I am using Maven from Eclipse

dependency:tree shows


[INFO] +- org.apache.spark:spark-core_2.10:jar:1.0.0:compile
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:runtime
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.4.0:compile
[INFO] |  |  +- org.apache.curator:curator-framework:jar:2.4.0:compile
[INFO] |  |  |  \- org.apache.curator:curator-client:jar:2.4.0:compile
[INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] |  | \- jline:jline:jar:0.9.94:compile
[INFO] |  +- org.eclipse.jetty:jetty-plus:jar:8.1.14.v20131031:compile
[INFO] |  |  +-
org.eclipse.jetty.orbit:javax.transaction:jar:1.1.1.v201105210645:compile
[INFO] |  |  +- org.eclipse.jetty:jetty-webapp:jar:8.1.14.v20131031:compile
[INFO] |  |  |  +- org.eclipse.jetty:jetty-xml:jar:8.1.14.v20131031:compile
[INFO] |  |  |  \-
org.eclipse.jetty:jetty-servlet:jar:8.1.14.v20131031:compile
[INFO] |  |  \- org.eclipse.jetty:jetty-jndi:jar:8.1.14.v20131031:compile
[INFO] |  | \-
org.eclipse.jetty.orbit:javax.mail.glassfish:jar:1.4.1.v201005082020:compile
[INFO] |  |\-
org.eclipse.jetty.orbit:javax.activation:jar:1.1.0.v201105071233:compile
[INFO] |  +- org.eclipse.jetty:jetty-security:jar:8.1.14.v20131031:compile
[INFO] |  +- org.eclipse.jetty:jetty-server:jar:8.1.14.v20131031:compile
[INFO] |  |  +-
org.eclipse.jetty.orbit:javax.servlet:jar:3.0.0.v201112011016:compile
[INFO] |  |  +-
org.eclipse.jetty:jetty-continuation:jar:8.1.14.v20131031:compile
[INFO] |  |  \- org.eclipse.jetty:jetty-http:jar:8.1.14.v20131031:compile
[INFO] |  | \- org.eclipse.jetty:jetty-io:jar:8.1.14.v20131031:compile
[INFO] |  +- com.google.guava:guava:jar:14.0.1:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.3.2:compile
[INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:compile
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.5:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.5:compile
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.5:compile
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
[INFO] |  +- com.ning:compress-lzf:jar:1.0.0:compile
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.0.5:compile
[INFO] |  +- com.twitter:chill_2.10:jar:0.3.6:compile
[INFO] |  |  \- com.esotericsoftware.kryo:kryo:jar:2.21:compile
[INFO] |  | +-
com.esotericsoftware.reflectasm:reflectasm:jar:shaded:1.07:compile
[INFO] |  | +- com.esotericsoftware.minlog:minlog:jar:1.2:compile
[INFO] |  | \- org.objenesis:objenesis:jar:1.2:compile
[INFO] |  +- com.twitter:chill-java:jar:0.3.6:compile
[INFO] |  +- commons-net:commons-net:jar:2.2:compile
[INFO] |  +-
org.spark-project.akka:akka-remote_2.10:jar:2.2.3-shaded-protobuf:compile
[INFO] |  |  +-
org.spark-project.akka:akka-actor_2.10:jar:2.2.3-shaded-protobuf:compile
[INFO] |  |  |  \- com.typesafe:config:jar:1.0.2:compile
[INFO] |  |  +- io.netty:netty:jar:3.6.6.Final:compile
[INFO] |  |  +-
org.spark-project.protobuf:protobuf-java:jar:2.4.1-shaded:compile
[INFO] |  |  \- org.uncommons.maths:uncommons-maths:jar:1.2.2a:compile
[INFO] |  +-
org.spark-project.akka:akka-slf4j_2.10:jar:2.2.3-shaded-protobuf:compile
[INFO] |  +- org.scala-lang:scala-library:jar:2.10.4:compile
[INFO] |  +- org.json4s:json4s-jackson_2.10:jar:3.2.6:compile
[INFO] |  |  +- org.json4s:json4s-core_2.10:jar:3.2.6:compile
[INFO] |  |  |  +- org.json4s:json4s-ast_2.10:jar:3.2.6:compile
[INFO] |  |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile
[INFO] |  |  |  \- org.scala-lang:scalap:jar:2.10.0:compile
[INFO] |  |  | \- org.scala-lang:scala-compiler:jar:2.10.0:compile
[INFO] |  |  |\- org.scala-lang:scala-reflect:jar:2.10.0:compile
[INFO] |  |  \-
com.fasterxml.jackson.core:jackson-databind:jar:2.3.0:compile
[INFO] |  | +-
com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
[INFO] |  | \- com.fasterxml.jackson.core:jackson-core:jar:2.3.0:compile
[INFO] |  +- colt:colt:jar:1.2.0:compile
[INFO] |  |  \- concurrent:concurrent:jar:1.3.4:compile
[INFO] |  +- org.apache.mesos:mesos:jar:shaded-protobuf:0.18.1:compile
[INFO] |  +- io.netty:netty-all:jar:4.0.17.Final:compile
[INFO] |  +- com.clearspring.analytics:stream:jar:2.5.1:compile
[INFO] |  +- com.codahale.metrics:metrics-core:jar:3.0.0:compile
[INFO] |  +- com.codahale.metrics:metrics-jvm:jar:3.0.0:compile
[INFO] |  +- com.codahale.metrics:metrics-json:jar:3.0.0:compile
[INFO] |  +- com.codahale.metrics:metrics-graphite:jar:3.0.0:compile
[INFO] |  +- org.tachyonproject:tachyon:jar:0.4.1-thrift:compile
[INFO] |  |  +- org.apache.ant:ant:jar:1.9.0:compile
[INFO] |  |  |  \- org.apache.ant:ant-launcher:jar:1.9.0:compile
[INFO] |  |  \- commons-io:commons-io:jar:2.4:compile
[INFO] |  +- org.spark-project:pyrolite:jar:2.0.1:compile
[INFO] |  \- net.sf.py4j:py4j:jar:0.8.1:compile
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.4.0:compile
[INFO] |  |  +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] 

Spark 1.0.0 Maven dependencies problems.

2014-06-09 Thread toivoa
Using 

org.apache.spark
spark-core_2.10
1.0.0

  
I can create simple test and run under Eclipse.
But when I try to deploy on test server I have dependencies problems.

1. Spark requires 
akka-remote_2.10
2.2.3-shaded-protobuf

  And this in turn requires 


io.netty
netty
3.6.6.Final


2. At the same time Spark itself requires 
netty-parent
4.0.17.Final

So now I have different Netty versions and I get either 

Exception in thread "main" java.lang.SecurityException: class
"javax.servlet.FilterRegistration"'s signer information does not match
signer information of other classes in the same package

When using 3.6.6.Final


Or

14/06/09 16:08:10 ERROR ActorSystemImpl: Uncaught fatal error from thread
[spark-akka.actor.default-dispatcher-4] shutting down ActorSystem [spark]
java.lang.NoClassDefFoundError: org/jboss/netty/util/Timer

When using 4.0.17.Final



What I am doing wrong and how to solve problem?

Thanks 
toivo




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-Maven-dependencies-problems-tp7247.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Wow! What a quick reply!

adding 


org.apache.hadoop
hadoop-client
2.4.0

  
solved the problem.

But now I get 

14/06/03 19:52:50 ERROR Shell: Failed to locate the winutils binary in the
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.(Shell.java:326)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.(Groups.java:77)
at
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
at 
org.apache.spark.deploy.SparkHadoopUtil.(SparkHadoopUtil.scala:36)
at
org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:109)
at 
org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala)

thanks
toivo



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-java-lang-IncompatibleClassChangeError-Found-class-org-apache-hadoop-mapreduce-TaskAtd-tp6818p6820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Hi

Set up project under Eclipse using Maven:


org.apache.spark
spark-core_2.10
1.0.0


Simple example fails:

  def main(args: Array[String]): Unit = {

val conf = new SparkConf()
 .setMaster("local")
 .setAppName("CountingSheep")
 .set("spark.executor.memory", "1g")
 
val sc = new SparkContext(conf)

val indir = "src/main/resources/testdata"
val files = sc.wholeTextFiles(indir, 10)
for( pair <- files)
  println(pair._1 + " = " + pair._2)
  

14/06/03 19:20:34 ERROR executor.Executor: Exception in task ID 0
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:164)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.(CombineFileRecordReader.java:126)
at
org.apache.spark.input.WholeTextFileInputFormat.createRecordReader(WholeTextFileInputFormat.scala:44)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:111)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:155)
... 13 more
Caused by: java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at
org.apache.spark.input.WholeTextFileRecordReader.(WholeTextFileRecordReader.scala:40)
... 18 more

Any idea?

thanks
toivo




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-java-lang-IncompatibleClassChangeError-Found-class-org-apache-hadoop-mapreduce-TaskAtd-tp6818.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.