What is the meaning to of 'STATE' in a worker/ an executor?

2015-03-29 Thread Niranda Perera
Hi,

I have noticed in the Spark UI, workers and executors run on several
states, ALIVE, LOADING, RUNNING, DEAD etc?

What exactly are these states mean and what is the effect it has on working
with those executor?
ex: whether an executor can not be used in the loading state, etc

cheers

-- 
Niranda


Re: what is the roadmap for Spark SQL dialect in the coming releases?

2015-01-25 Thread Niranda Perera
Thanks Michael.

A clarification. So the HQL dialect provided by HiveContext, does it use
catalyst optimizer? I though HiveContext is only related to Hive
integration in Spark!

Would be grateful if you could clarify this

cheers

On Sun, Jan 25, 2015 at 1:23 AM, Michael Armbrust mich...@databricks.com
wrote:

 I generally recommend people use the HQL dialect provided by the
 HiveContext when possible:
 http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started

 I'll also note that this is distinct from the Hive on Spark project, which
 is based on the Hive query optimizer / execution engine instead of the
 catalyst optimizer that is shipped with Spark.

 On Thu, Jan 22, 2015 at 3:12 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 would like to know if there is an update on this?

 rgds

 On Mon, Jan 12, 2015 at 10:44 AM, Niranda Perera 
 niranda.per...@gmail.com wrote:

 Hi,

 I found out that SparkSQL supports only a relatively small subset of SQL
 dialect currently.

 I would like to know the roadmap for the coming releases.

 And, are you focusing more on popularizing the 'Hive on Spark' SQL
 dialect or the Spark SQL dialect?

 Rgds
 --
 Niranda




 --
 Niranda





-- 
Niranda


Re: what is the roadmap for Spark SQL dialect in the coming releases?

2015-01-22 Thread Niranda Perera
Hi,

would like to know if there is an update on this?

rgds

On Mon, Jan 12, 2015 at 10:44 AM, Niranda Perera niranda.per...@gmail.com
wrote:

 Hi,

 I found out that SparkSQL supports only a relatively small subset of SQL
 dialect currently.

 I would like to know the roadmap for the coming releases.

 And, are you focusing more on popularizing the 'Hive on Spark' SQL dialect
 or the Spark SQL dialect?

 Rgds
 --
 Niranda




-- 
Niranda


what is the roadmap for Spark SQL dialect in the coming releases?

2015-01-11 Thread Niranda Perera
Hi,

I found out that SparkSQL supports only a relatively small subset of SQL
dialect currently.

I would like to know the roadmap for the coming releases.

And, are you focusing more on popularizing the 'Hive on Spark' SQL dialect
or the Spark SQL dialect?

Rgds
-- 
Niranda


example insert statement in Spark SQL

2015-01-07 Thread Niranda Perera
Hi,

Are insert statements supported in Spark? if so, can you please give me an
example?

Rgds

-- 
Niranda


Re: Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi Sean,

My mistake, Guava 11 dependency came from the hadoop-commons indeed.

I'm running the following simple app in spark 1.2.0 standalone local
cluster (2 workers) with Hadoop 1.2.1

public class AvroSparkTest {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf()
.setMaster(spark://niranda-ThinkPad-T540p:7077)
//(local[2])
.setAppName(avro-spark-test);

JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);
episodes.printSchema();
episodes.registerTempTable(avroTable);
ListRow result = sqlContext.sql(SELECT * FROM
avroTable).collect();

for (Row row : result) {
System.out.println(row.toString());
}
}
}

As you pointed out, this error occurs while adding the hadoop dependency.
this runs without a problem when the hadoop dependency is removed and the
master is set to local[].

Cheers

On Tue, Jan 6, 2015 at 3:23 PM, Sean Owen so...@cloudera.com wrote:

 -dev

 Guava was not downgraded to 11. That PR was not merged. It was part of a
 discussion about, indeed, what to do about potential Guava version
 conflicts. Spark uses Guava, but so does Hadoop, and so do user programs.

 Spark uses 14.0.1 in fact:
 https://github.com/apache/spark/blob/master/pom.xml#L330

 This is a symptom of conflict between Spark's Guava 14 and Hadoop's Guava
 11. See for example https://issues.apache.org/jira/browse/HIVE-7387 as
 well.

 Guava is now shaded in Spark as of 1.2.0 (and 1.1.x?), so I would think a
 lot of these problems are solved. As we've seen though, this one is tricky.

 What's your Spark version? and what are you executing? what mode --
 standalone, YARN? What Hadoop version?


 On Tue, Jan 6, 2015 at 8:38 AM, Niranda Perera niranda.per...@gmail.com
 wrote:

 Hi,

 I have been running a simple Spark app on a local spark cluster and I
 came across this error.

 Exception in thread main java.lang.NoSuchMethodError:
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
 at org.apache.spark.util.collection.OpenHashSet.org
 $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
 at
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
 at
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
 at
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
 at
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
 at
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
 at
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
 at
 org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
 at
 org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
 at
 org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
 at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
 at
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 at
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
 at
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
 at
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
 at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
 at
 com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
 at
 com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
 at
 org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
 at
 org.apache.spark.sql.catalyst.planning.QueryPlanner

Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
Hi,

I have been running a simple Spark app on a local spark cluster and I came
across this error.

Exception in thread main java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
at org.apache.spark.util.collection.OpenHashSet.org
$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
at
org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
at
org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at
org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
at
org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
at
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
at
org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
at
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78)
at
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70)
at
org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31)
at
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
at
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:136)
at
org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
at
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
at
org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
at
org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:84)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:695)
at
com.databricks.spark.avro.AvroRelation.buildScan$lzycompute(AvroRelation.scala:45)
at
com.databricks.spark.avro.AvroRelation.buildScan(AvroRelation.scala:44)
at
org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:56)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at
org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)


While looking into this I found out that Guava was downgraded to version 11
in this PR.
https://github.com/apache/spark/pull/1610

In this PR OpenHashSet.scala:261 line hashInt has been changed to hashLong.
But when I actually run my app,  java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt error occurs,
which is understandable because hashInt is not available before Guava 12.

So, I''m wondering why this occurs?

Cheers
-- 
Niranda Perera


Spark avro: Sample app fails to run in a spark standalone cluster

2015-01-05 Thread Niranda Perera
(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1775)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1327)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

what might be the reason behind this? I'm using intelliJ idea as my IDE

I have attached the logs herewith

Rgds
-- 
Niranda Perera
/usr/local/java/jdk1.7.0_55/bin/java -Didea.launcher.port=7532 
-Didea.launcher.bin.path=/home/niranda/software/idea-IU-135.1306/bin 
-Dfile.encoding=UTF-8 -classpath 
/usr/local/java/jdk1.7.0_55/jre/lib/jce.jar:/usr/local/java/jdk1.7.0_55/jre/lib/javaws.jar:/usr/local/java/jdk1.7.0_55/jre/lib/rt.jar:/usr/local/java/jdk1.7.0_55/jre/lib/management-agent.jar:/usr/local/java/jdk1.7.0_55/jre/lib/jfxrt.jar:/usr/local/java/jdk1.7.0_55/jre/lib/plugin.jar:/usr/local/java/jdk1.7.0_55/jre/lib/jfr.jar:/usr/local/java/jdk1.7.0_55/jre/lib/jsse.jar:/usr/local/java/jdk1.7.0_55/jre/lib/resources.jar:/usr/local/java/jdk1.7.0_55/jre/lib/deploy.jar:/usr/local/java/jdk1.7.0_55/jre/lib/charsets.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/sunec.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/dnsns.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/sunjce_provider.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/zipfs.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/sunpkcs11.jar:/usr/local/java/jdk1.7.0_55/jre/lib/ext/localedata.jar:/home/niranda/projects/avro-spark-test/target/classes:/home/niranda/.m2/repository/com/databricks/spark-avro_2.10/0.1/spark-avro_2.10-0.1.jar:/home/niranda/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar:/home/niranda/.m2/repository/org/apache/avro/avro/1.7.7/avro-1.7.7.jar:/home/niranda/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/home/niranda/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl

SparkSQL 1.2.0 sources API error

2015-01-02 Thread Niranda Perera
Hi all,

I am evaluating the spark sources API released with Spark 1.2.0. But I'm
getting a ava.lang.NoSuchMethodError:
org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V
error running the program.

Error log:
15/01/03 10:41:30 ERROR ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down
ActorSystem [sparkDriver]
java.lang.NoSuchMethodError:
org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V
at
akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:283)
at
akka.remote.transport.netty.NettyTransport.init(NettyTransport.scala:240)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:161)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:200)
at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:692)
at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:684)
at
scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at
scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
at
akka.remote.EndpointManager.akka$remote$EndpointManager$$listens(Remoting.scala:684)
at
akka.remote.EndpointManager$$anonfun$receive$2.applyOrElse(Remoting.scala:492)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at akka.remote.EndpointManager.aroundReceive(Remoting.scala:395)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



Following is my simple Java code:

public class AvroSparkTest {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setMaster(local[2])
.setAppName(avro-spark-test)

.setSparkHome(/home/niranda/software/spark-1.2.0-bin-hadoop1);

JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);

JavaSQLContext sqlContext = new JavaSQLContext(sparkContext);
JavaSchemaRDD episodes = AvroUtils.avroFile(sqlContext,

/home/niranda/projects/avro-spark-test/src/test/resources/episodes.avro);

episodes.printSchema();
}

}

Dependencies:
dependencies
dependency
groupIdcom.databricks/groupId
artifactIdspark-avro_2.10/artifactId
version0.1/version
/dependency

dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.2.0/version
/dependency
/dependencies

I'm using Java 1.7, IntelliJ IDEA and Maven as the build tool.

What might cause this error and what may be the remedy?

Cheers

-- 
Niranda


Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
Hi Michael,

About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?

Cheers

On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust mich...@databricks.com
wrote:

 You probably don't need to create a new kind of SchemaRDD.  Instead I'd
 suggest taking a look at the data sources API that we are adding in Spark
 1.2.  There is not a ton of documentation, but the test cases show how to
 implement the various interfaces
 https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources,
 and there is an example library for reading Avro data
 https://github.com/databricks/spark-avro.

 On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera nira...@wso2.com wrote:

 Hi,

 I am evaluating Spark for an analytic component where we do batch
 processing of data using SQL.

 So, I am particularly interested in Spark SQL and in creating a SchemaRDD
 from an existing API [1].

 This API exposes elements in a database as datasources. Using the methods
 allowed by this data source, we can access and edit data.

 So, I want to create a custom SchemaRDD using the methods and provisions
 of
 this API. I tried going through Spark documentation and the Java Docs, but
 unfortunately, I was unable to come to a final conclusion if this was
 actually possible.

 I would like to ask the Spark Devs,
 1. As of the current Spark release, can we make a custom SchemaRDD?
 2. What is the extension point to a custom SchemaRDD? or are there
 particular interfaces?
 3. Could you please point me the specific docs regarding this matter?

 Your help in this regard is highly appreciated.

 Cheers

 [1]

 https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44





-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44


Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
Hi,

I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.

So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].

This API exposes elements in a database as datasources. Using the methods
allowed by this data source, we can access and edit data.

So, I want to create a custom SchemaRDD using the methods and provisions of
this API. I tried going through Spark documentation and the Java Docs, but
unfortunately, I was unable to come to a final conclusion if this was
actually possible.

I would like to ask the Spark Devs,
1. As of the current Spark release, can we make a custom SchemaRDD?
2. What is the extension point to a custom SchemaRDD? or are there
particular interfaces?
3. Could you please point me the specific docs regarding this matter?

Your help in this regard is highly appreciated.

Cheers

[1]
https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics

-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44