Spark Shell No suitable driver found error

2015-07-10 Thread satish chandra j
HI All, I have issues to make external jar available to Spark Shell I have used -jars options while starting Spark Shell to make these available when I give command Class.forName(org.postgresql.Driver it is not giving any error But when action operation is performed on RDD than I am getting

[no subject]

2015-07-10 Thread satish chandra j
HI All, I have issues to make external jar available to Spark Shell I have used -jars options while starting Spark Shell to make these available when I give command Class.forName(org.postgresql.Driver it is not giving any error But when action operation is performed on RDD than I am getting

Re: Problem in Understanding concept of Physical Cores

2015-07-10 Thread Aniruddh Sharma
Hi TD, Thanks for elaboration. I have further doubts based on further test that I did after your guidance Case 1: Standalone Spark-- In standalone mode, as you explained,master in spark-submit local[*] implicitly, so it uses as creates threads as the number of cores that VM has, but User can

RE:Building scaladoc using build/sbt unidoc failure

2015-07-10 Thread MEETHU MATHEW
Hi, I am getting the assertion error while trying to run build/sbt unidoc same as you described in Building scaladoc using build/sbt unidoc failure .Could you tell me how you get it working ? |   | |   |   |   |   |   | | Building scaladoc using build/sbt unidoc failureHello,I am trying to build

HiveContext with Cloudera Pseudo Cluster

2015-07-10 Thread Sukhmeet Sethi
Hi All, I am trying to run a simple join on Hive through SparkShell on pseudo cloudera cluster on ubuntu machine : *val hc = new HiveContext(sc);* *hc.sql(use testdb);* But it is failing with the message : org.apache.hadoop.hive.ql.parse.SemanticException: Database does not exist: testdb

Ipython notebook, ec2 spark cluster and matplotlib

2015-07-10 Thread Marco Didonna
Hello everybody, I'm running a two node spark cluster on ec2, created using the provided scripts. I then ssh into the master and invoke PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS='notebook --profile=pyspark' spark/bin/pyspark. This launches a spark notebook which has been instructed

reduceByKeyAndWindow with initial state

2015-07-10 Thread Imran Alam
We have a streaming job that makes use of reduceByKeyAndWindow https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341. We want this to work with an initial state. The idea is to avoid losing state if the

Re: Accessing Spark Web UI from another place than where the job actually ran

2015-07-10 Thread Roxana Ioana Roman
Thank you for your answer! The problem is, I cannot ssh to the master directly. I have to ssh first to a frontend, then I have to ssh to another frontend. And only from this last frontend I can ssh to my master. Can I do this by ssh -ing with -L to the first two frontends and to the master? And

Re: query on Spark + Flume integration using push model

2015-07-10 Thread Akhil Das
Here's an example https://github.com/przemek1990/spark-streaming Thanks Best Regards On Thu, Jul 9, 2015 at 4:35 PM, diplomatic Guru diplomaticg...@gmail.com wrote: Hello all, I'm trying to configure the flume to push data into a sink so that my stream job could pick up the data. My events

Re: Accessing Spark Web UI from another place than where the job actually ran

2015-07-10 Thread Akhil Das
When you connect to the machines you can create an ssh tunnel to access the UI : ssh -L 8080:127.0.0.1:8080 MasterMachinesIP And then you can simply open localhost:8080 in your browser and it should show up the UI. Thanks Best Regards On Thu, Jul 9, 2015 at 7:44 PM, rroxanaioana

Re: DataFrame insertInto fails, saveAsTable works (Azure HDInsight)

2015-07-10 Thread Akhil Das
It seems an issue with the azure, there was a discussion over here https://azure.microsoft.com/en-in/documentation/articles/hdinsight-hadoop-spark-install/ Thanks Best Regards On Thu, Jul 9, 2015 at 9:42 PM, Daniel Haviv daniel.ha...@veracity-group.com wrote: Hi, I'm running Spark 1.4 on

Re: Caching in spark

2015-07-10 Thread Akhil Das
https://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory Thanks Best Regards On Fri, Jul 10, 2015 at 10:05 AM, vinod kumar vinodsachin...@gmail.com wrote: Hi Guys, Can any one please share me how to use caching feature of spark via spark sql queries? -Vinod

Word2Vec distributed?

2015-07-10 Thread Carsten Schnober
Hi, I've been experimenting with the Spark Word2Vec implementation in the MLLib package. It seems to me that only the preparatory steps are actually performed in a distributed way, i.e. stages 0-2 that prepare the data. In stage 3 (mapPartitionsWithIndex at Word2Vec.scala:312), only one node seems

Saving RDD into cassandra keyspace.

2015-07-10 Thread Prateek .
Hi, I am beginner to spark , I want save the word and its count to cassandra keyspace, I wrote the following code import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector._ object SparkWordCount { def

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-10 Thread Anand Nalya
Thanks for the help Dean/TD, I was able to cut the lineage with checkpointing with following code: dstream.countByValue().foreachRDD((rdd, time) = { val joined = rdd.union(current).reduceByKey(_+_, 2).leftOuterJoin(base) val toUpdate = joined.filter(myfilter).map(mymap) val

K Nearest Neighbours

2015-07-10 Thread Carsten Schnober
Hi, I have the following problem, which is a kind of special case of k nearest neighbours. I have an Array of Vectors (v1) and an RDD[(Long, Vector)] of pairs of vectors with indexes (v2). The array v1 easily fits into a single node's memory (~100 entries), but v2 is very large (millions of

Starting Spark-Application without explicit submission to cluster?

2015-07-10 Thread algermissen1971
Hi, I am a bit confused about the steps I need to take to start a Spark application on a cluster. So far I had this impression from the documentation that I need to explicitly submit the application using for example spark-submit. However, from the SparkContext constructur signature I get the

SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell

2015-07-10 Thread Prateek .
Hi, I am running single spark-shell but observing this error when I give val sc = new SparkContext(conf) 15/07/10 15:42:56 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use java.net.BindException: Address already in use

Re: SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell

2015-07-10 Thread Akhil Das
that's because sc is already initialized. You can do sc.stop() before you initialize another one. Thanks Best Regards On Fri, Jul 10, 2015 at 3:54 PM, Prateek . prat...@aricent.com wrote: Hi, I am running single spark-shell but observing this error when I give val sc = new

Best way to avoid updateStateByKey from running without data

2015-07-10 Thread micvog
UpdateStateByKey will run the update function on every interval, even if the incoming batch is empty. Is there a way to prevent that? If the incoming DStream contains no RDDs (or RDDs of count 0) then I don't want my update function to run. Note that this is different from running the update

How to restrict disk space for spark caches on yarn?

2015-07-10 Thread Peter Rudenko
Hi, i have a spark ML worklflow. It uses some persist calls. When i launch it with 1 tb dataset - it puts down all cluster, becauses it fills all disk space at /yarn/nm/usercache/root/appcache: http://i.imgur.com/qvRUrOp.png I found a yarn settings:

Re: query on Spark + Flume integration using push model

2015-07-10 Thread diplomatic Guru
Hi Akhil, thank you for your reply. Does that mean that original Spark Streaming only support Avro? If that the case then why only Avro? Is there a particular reason? The project linked is for Scala but I'm using Java. Is there another project? On 10 July 2015 at 08:46, Akhil Das

spark-submit

2015-07-10 Thread AshutoshRaghuvanshi
when I do run this command: ashutosh@pas-lab-server7:~/spark-1.4.0$ ./bin/spark-submit \ --class org.apache.spark.graphx.lib.Analytics \ --master spark://172.17.27.12:7077 \ assembly/target/scala-2.10/spark-assembly-1.4.0-hadoop2.2.0.jar \ pagerank soc-LiveJournal1.txt --numEPart=100

Spark performance

2015-07-10 Thread Ravisankar Mani
Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What is the best data base(Spark or sql) to store or retrieve data around 50,000 to 1l records ? regards, Ravi

Re: Saving RDD into cassandra keyspace.

2015-07-10 Thread Todd Nist
I would strongly encourage you to read the docs at, they are very useful in getting up and running: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md For your use case shown above, you will need to ensure that you include the appropriate version of the

RE: Saving RDD into cassandra keyspace.

2015-07-10 Thread Prateek .
Hi, Thanks Todd..the link is really helpful to get started. ☺ -Prateek From: Todd Nist [mailto:tsind...@gmail.com] Sent: Friday, July 10, 2015 4:43 PM To: Prateek . Cc: user@spark.apache.org Subject: Re: Saving RDD into cassandra keyspace. I would strongly encourage you to read the docs at,

RE: SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell

2015-07-10 Thread Prateek .
Thanks Akhil! I got it . ☺ From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Friday, July 10, 2015 4:02 PM To: Prateek . Cc: user@spark.apache.org Subject: Re: SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell that's because sc

Re: Debug Spark Streaming in PyCharm

2015-07-10 Thread Tathagata Das
spark-submit does a lot of magic configurations (classpaths etc) underneath the covers to enable pyspark to find Spark JARs, etc. I am not sure how you can start running things directly from the PyCharm IDE. Others in the community may be able to answer. For now the main way to run pyspark stuff

Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
Hey , Is there any *guarantee of fix ordering among the batches/RDDs* . After searching a lot I found there is no ordering by default (from the framework itself ) not only on *batch wise *but *also ordering within batches* .But i doubt is there any change from old spark versions to spark

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option), but with workers I have errors. So I run spark shell: ./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory 6900m --driver-memory 15g and workers

Re: How do we control output part files created by Spark job?

2015-07-10 Thread Srikanth
Is there a join involved in your sql? Have a look at spark.sql.shuffle.partitions? Srikanth On Wed, Jul 8, 2015 at 1:29 AM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Srikant thanks for the response. I have the following code: hiveContext.sql(insert into... ).coalesce(6) Above code does

Is it possible to change the default port number 7077 for spark?

2015-07-10 Thread ashishdutt
Hello all, In my lab a colleague installed and configured spark 1.3.0 on a 4 noded cluster on CDH5.4 environment. The default port number for our spark configuration is 7456. I have been trying to SSH to spark-master from using this port number but it fails every time giving error JVM is timed

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
Thanks Ayan , I was curious to know* how Spark does it *.Is there any *Documentation* where i can get the detail about that . Will you please point me out some detailed link etc . May be it does something like *transactional topologies in storm*.(

Re: JAR containing org.apache.hadoop.mapreduce.lib.input.FileInputFormat

2015-07-10 Thread Ted Yu
For hadoop 2.x : tvf ~/2-hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/hadoop-mapreduce-client-core-2.8.0-SNAPSHOT.jar | grep FileInputFormat.class ... 17552 Fri Apr 24 15:57:54 PDT 2015 org/apache/hadoop/mapreduce/lib/input/FileInputFormat.class

RE: Spark performance

2015-07-10 Thread Mohammed Guller
Hi Ravi, First, Neither Spark nor Spark SQL is a database. Both are compute engines, which need to be paired with a storage system. Seconds, they are designed for processing large distributed datasets. If you have only 100,000 records or even a million records, you don’t need Spark. A RDBMS

Re: reduceByKeyAndWindow with initial state

2015-07-10 Thread Tathagata Das
Are you talking about reduceByKeyAndWindow with or without inverse reduce? TD On Fri, Jul 10, 2015 at 2:07 AM, Imran Alam im...@newscred.com wrote: We have a streaming job that makes use of reduceByKeyAndWindow

Linear search between particular log4j log lines

2015-07-10 Thread ssbiox
Hello, I have a very specific question on how to do a search between particular lines of log file. I did some research to find the answer and what I learned is that if one of the shuffle operation applied to RDD, there is no a way to reconstruct the sequence of line (except zipping with id). I'm

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread ayan guha
AFAIK, it is guranteed that batch t+1 will not start processing until batch t is done. ordeing within batch - what do you mean by that? In essence, the (mini) batch will get distributed in partitions like a normal RDD, so following rdd.zipWithIndex should give a wy to order them by the time they

Re: [Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-10 Thread Terry Hole
Michael, Thanks - Terry Michael Armbrust mich...@databricks.com于2015年7月11日星期六 04:02写道: Metastore configuration should be set in hive-site.xml. On Thu, Jul 9, 2015 at 8:59 PM, Terry Hole hujie.ea...@gmail.com wrote: Hi, I am trying to set the hive metadata destination to a mysql database

Re: Is it possible to change the default port number 7077 for spark?

2015-07-10 Thread ayan guha
SSH by default should be on port 22. 7456 is the port is where master is listening. So any spark app should be able to connect to master using that port. On 11 Jul 2015 13:50, ashishdutt ashish.du...@gmail.com wrote: Hello all, In my lab a colleague installed and configured spark 1.3.0 on a 4

Re: Apache Spark : Custom function for reduceByKey - missing arguments for method

2015-07-10 Thread Richard Marscher
Did you try it by adding the `_` after the method names to partially apply them? Scala is saying that its trying to immediately apply those methods but can't find arguments. But you instead are trying to pass them along as functions (which they aren't). Here is a link to a stackoverflow answer

Re: spark ec2 as non-root / any plan to improve that in the future ?

2015-07-10 Thread Mathieu D
Quick and clear answer thank you. 2015-07-09 21:07 GMT+02:00 Nicholas Chammas nicholas.cham...@gmail.com: No plans to change that at the moment, but agreed it is against accepted convention. It would be a lot of work to change the tool, change the AMIs, and test everything. My suggestion is

Re: What is a best practice for passing environment variables to Spark workers?

2015-07-10 Thread Dmitry Goldenberg
Thanks, Akhil. We're trying the conf.setExecutorEnv() approach since we've already got environment variables set. For system properties we'd go the conf.set(spark.) route. We were concerned that doing the below type of thing did not work, which this blog post seems to confirm (

Spark Broadcasting large dataset

2015-07-10 Thread huanglr
Hey, Guys! I am using spark for NGS data application. In my case I have to broadcast a very big dataset to each task. However there are serveral tasks (say 48 tasks) running on cpus (also 48 cores) in the same node. These tasks, who run on the same node, could share the same dataset. But

Debug Spark Streaming in PyCharm

2015-07-10 Thread blbradley
Hello, I'm trying to debug a PySpark app with Kafka Streaming in PyCharm. However, PySpark cannot find the jar dependencies for Kafka Streaming without editing the program. I can temporarily use SparkConf to set 'spark.jars', but I'm using Mesos for production and don't want to edit my program

Re: Issues when combining Spark and a third party java library

2015-07-10 Thread maxdml
I'm using hadoop 2.5.2 with spark 1.4.0 and I can also see in my logs: 15/07/09 06:39:02 DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop classes are unavailable. Using the older Hadoop location info code. java.lang.ClassNotFoundException:

Fwd: SparkSQL Postgres balanced partition of DataFrames

2015-07-10 Thread Moises Baly
Hi, I have a very simple setup of SparkSQL connecting to a Postgres DB and I'm trying to get a DataFrame from a table, the Dataframe with a number X of partitions (lets say 2). The code would be the following: MapString, String options = new HashMapString, String(); options.put(url, DB_URL);

Re: Spark on Tomcat has exception IncompatibleClassChangeError: Implementing class

2015-07-10 Thread Zoran Jeremic
It looks like there is no problem with Tomcat 8. On Fri, Jul 10, 2015 at 11:12 AM, Zoran Jeremic zoran.jere...@gmail.com wrote: Hi Ted, I'm running Tomcat 7 with Java: java version 1.8.0_45 Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build

SparkHub: a new community site for Apache Spark

2015-07-10 Thread Patrick Wendell
Hi All, Today, I'm happy to announce SparkHub (http://sparkhub.databricks.com), a service for the Apache Spark community to easily find the most relevant Spark resources on the web. SparkHub is a curated list of Spark news, videos and talks, package releases, upcoming events around the world,

Re: Spark on Tomcat has exception IncompatibleClassChangeError: Implementing class

2015-07-10 Thread Ted Yu
What version of Java is Tomcat run ? Thanks On Jul 10, 2015, at 10:09 AM, Zoran Jeremic zoran.jere...@gmail.com wrote: Hi, I've developed maven application that uses mongo-hadoop connector to pull data from mongodb and process it using Apache spark. The whole process runs smoothly

Spark Streaming - Inserting into Tables

2015-07-10 Thread Brandon White
Why does this not work? Is insert into broken in 1.3.1? It does not throw any errors, fail, or throw exceptions. It simply does not work. val ssc = new StreamingContext(sc, Minutes(10)) val currentStream = ssc.textFileStream(ss3://textFileDirectory/) val dayBefore =

Unit tests of spark application

2015-07-10 Thread Naveen Madhire
Hi, I want to write junit test cases in scala for testing spark application. Is there any guide or link which I can refer. Thank you very much. -Naveen

Re: Spark on Tomcat has exception IncompatibleClassChangeError: Implementing class

2015-07-10 Thread Zoran Jeremic
Hi Ted, I'm running Tomcat 7 with Java: java version 1.8.0_45 Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Zoran On Fri, Jul 10, 2015 at 10:45 AM, Ted Yu yuzhih...@gmail.com wrote: What version of Java is Tomcat run ?

Re: Unit tests of spark application

2015-07-10 Thread Richard Marscher
Unless you had something specific in mind, it should be as simple as creating a SparkContext object using a master of local[2] in your tests On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire vmadh...@umail.iu.edu wrote: Hi, I want to write junit test cases in scala for testing spark

Re: Pyspark not working on yarn-cluster mode

2015-07-10 Thread Elkhan Dadashov
Yes, you can launch (from Java code) pyspark scripts with yarn-cluster mode without using the spark-submit script. Check SparkLauncher code in this link https://github.com/apache/spark/tree/master/launcher/src/main/java/org/apache/spark/launcher . SparkLauncher is not dependent on Spark core

Re: Starting Spark-Application without explicit submission to cluster?

2015-07-10 Thread Andrew Or
Hi Jan, Most SparkContext constructors are there for legacy reasons. The point of going through spark-submit is to set up all the classpaths, system properties, and resolve URIs properly *with respect to the deployment mode*. For instance, jars are distributed differently between YARN cluster

Spark on Tomcat has exception IncompatibleClassChangeError: Implementing class

2015-07-10 Thread Zoran Jeremic
Hi, I've developed maven application that uses mongo-hadoop connector to pull data from mongodb and process it using Apache spark. The whole process runs smoothly if I run it on embedded Jetty server. However, if I deploy it to Tomcat server 7, it's always interrupted at the line of code which

Re: PySpark without PySpark

2015-07-10 Thread Sujit Pal
Hi Ashish, Cool. glad it worked out. I have only used Spark clusters on EC2, which I spin up using the spark-ec2 scripts (part of the Spark downloads). So don't have any experience setting up inhouse clusters like you want to do. But I found some documentation here that may be helpful.

SparkDriverExecutionException when using actorStream

2015-07-10 Thread Juan Rodríguez Hortalá
Hi, I'm trying to create a Spark Streaming actor stream but I'm having several problems. First of all the guide from https://spark.apache.org/docs/latest/streaming-custom-receivers.html refers to the code

Re: spark-submit

2015-07-10 Thread Andrew Or
Hi Ashutosh, I believe the class is org.apache.spark.*examples.*graphx.Analytics? If you're running page rank on live journal you could just use org.apache.spark.examples.graphx.LiveJournalPageRank. -Andrew 2015-07-10 3:42 GMT-07:00 AshutoshRaghuvanshi ashutosh.raghuvans...@gmail.com: when I

SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-10 Thread kachau
I have installed the SparkR package from Spark distribution into the R library. I can call the following command and it seems to work properly: library(SparkR) However, when I try to get the Spark context using the following code, sc - sparkR.init(master=local) It fails after some time with the

dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem work for me

2015-07-10 Thread kachau
Hi I have Hive insert into query which creates new Hive partitions. I have two Hive partitions named server and date. Now I execute insert into queries using the following code and try to save it DataFrame dframe = hiveContext.sql(insert into summary1 partition(server='a1',date='2015-05-22')

Re: Pyspark not working on yarn-cluster mode

2015-07-10 Thread Sandy Ryza
To add to this, conceptually, it makes no sense to launch something in yarn-cluster mode by creating a SparkContext on the client - the whole point of yarn-cluster mode is that the SparkContext runs on the cluster, not on the client. On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin

Re: Unit tests of spark application

2015-07-10 Thread Daniel Siegmann
On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire vmadh...@umail.iu.edu wrote: I want to write junit test cases in scala for testing spark application. Is there any guide or link which I can refer. https://spark.apache.org/docs/latest/programming-guide.html#unit-testing Typically I create test

Re: [Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-10 Thread Michael Armbrust
Metastore configuration should be set in hive-site.xml. On Thu, Jul 9, 2015 at 8:59 PM, Terry Hole hujie.ea...@gmail.com wrote: Hi, I am trying to set the hive metadata destination to a mysql database in hive context, it works fine in spark 1.3.1, but it seems broken in spark 1.4.1-rc1,

Re: Unit tests of spark application

2015-07-10 Thread Holden Karau
Somewhat biased of course, but you can also use spark-testing-base from spark-packages.org as a basis for your unittests. On Fri, Jul 10, 2015 at 12:03 PM, Daniel Siegmann daniel.siegm...@teamaol.com wrote: On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire vmadh...@umail.iu.edu wrote: I want

RE: Spark Broadcasting large dataset

2015-07-10 Thread Ashic Mahtab
When you say tasks, do you mean different applications, or different tasks in the same application? If it's the same program, they should be able to share the broadcasted value. But given you're asking the question, I imagine they're separate. And in that case, afaik, the answer is no. You

Re: RE: Spark Broadcasting large dataset

2015-07-10 Thread huanglr
Hi, Ashic, Thank you very much for your reply! The tasks I mention is a running Function that I implemented with Spark API and passed to each partition of a RDD. Within the Function I broadcast a big variable to be queried by each partition. So, When I am running on a 48 cores slave node. I

Re: Issues when combining Spark and a third party java library

2015-07-10 Thread maxdml
Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4 and higher from the official website. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html Sent from the

Re: Unit tests of spark application

2015-07-10 Thread Burak Yavuz
I can +1 Holden's spark-testing-base package. Burak On Fri, Jul 10, 2015 at 12:23 PM, Holden Karau hol...@pigscanfly.ca wrote: Somewhat biased of course, but you can also use spark-testing-base from spark-packages.org as a basis for your unittests. On Fri, Jul 10, 2015 at 12:03 PM, Daniel

Spark Streaming and using Swift object store for checkpointing

2015-07-10 Thread algermissen1971
Hi, initially today when moving my streaming application to the cluster the first time I ran in to newbie error of using a local file system for checkpointing and the RDD partition count differences (see exception below). Having neither HDFS nor S3 (and the Cassandra-Connector not yet

SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder

2015-07-10 Thread Mulugeta Mammo
Hi, My spark job runs without error, but once it completes I get this message and the app is logged as incomplete application in my spark-history : SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See

JAR containing org.apache.hadoop.mapreduce.lib.input.FileInputFormat

2015-07-10 Thread Lincoln Atkinson
Sorry, only indirectly Spark-related. I've attempting to create a .NET proxy for spark-core, using JNI4NET. At the moment I'm stuck with the following error when running the proxy generator: java.lang.NoClassDefFoundError: org.apache.hadoop.mapreduce.lib.input.FileInputFormat I've resolved

Re: SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder

2015-07-10 Thread Mulugeta Mammo
No. Works perfectly. On Fri, Jul 10, 2015 at 3:38 PM, liangdianpeng liangdianp...@vip.163.com wrote: if the class inside the spark_XXX.jar was damaged 发自网易邮箱手机版 On 2015-07-11 06:13 , Mulugeta Mammo mulugeta.abe...@gmail.com Wrote: Hi, My spark job runs without error, but once it