Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Thanks Judy, this is exactly what I'm looking for. However, and plz forgive me if it's a dump question is: It seems to me that thrift is the same as hive2 JDBC driver, does this mean that starting thrift will start hive as well on the server? On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash

Re: do not assemble the spark example jar

2014-12-09 Thread lihu
Can this assembly get faster if we do not need the Spark SQL or some other component in spark ? such as we only need the core of spark. On Wed, Nov 26, 2014 at 3:37 PM, lihu lihu...@gmail.com wrote: Matei, sorry for my last typo error. And the tip can improve about 30s in my computer. On

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian
Essentially, the Spark SQL JDBC Thrift server is just a Spark port of HiveServer2. You don't need to run Hive, but you do need a working Metastore. On 12/9/14 3:59 PM, Anas Mosaad wrote: Thanks Judy, this is exactly what I'm looking for. However, and plz forgive me if it's a dump question is:

Re: How can I create an RDD with millions of entries created programmatically

2014-12-09 Thread Daniel Darabos
Ah... I think you're right about the flatMap then :). Or you could use mapPartitions. (I'm not sure if it makes a difference.) On Mon, Dec 8, 2014 at 10:09 PM, Steve Lewis lordjoe2...@gmail.com wrote: looks good but how do I say that in Java as far as I can see sc.parallelize (in Java) has

KafkaUtils explicit acks

2014-12-09 Thread Mukesh Jha
Hello Experts, I'm working on a spark app which reads data from kafka persists it in hbase. Spark documentation states the below *[1]* that in case of worker failure we can loose some data. If not how can I make my kafka stream more reliable? I have seen there is a simple consumer *[2]* but I'm

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Thanks Cheng, I thought spark-sql is using the same exact metastore, right? However, it didn't work as expected. Here's what I did. In spark-shell, I loaded a csv files and registered the table, say countries. Started the thrift server. Connected using beeline. When I run show tables or !tables,

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian
How did you register the table under spark-shell? Two things to notice: 1. To interact with Hive, HiveContext instead of SQLContext must be used. 2. `registerTempTable` doesn't persist the table into Hive metastore, and the table is lost after quitting spark-shell. Instead, you must use

spark 1.1.1 Maven dependency

2014-12-09 Thread sivarani
Dear All, I am using spark streaming, It was working fine when i was using spark1.0.2, now i repeatedly getting few issue Like class not found, i am using the same pom.xml with the updated version for all spark modules i am using spark-core,streaming, streaming with kafka modules.. Its

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili
I checked my code again, and located the issue that, if we do the `load data inpath` before select statement, the application will get stuck, if don't, it won't get stuck.Log info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread

RE: Issues on schemaRDD's function got stuck

2014-12-09 Thread LinQili
I checked my code again, and located the issue that, if we do the `load data inpath` before select statement, the application will get stuck, if don't, it won't get stuck.Get stuck code: val sqlLoadData = sLOAD DATA INPATH '$currentFile' OVERWRITE INTO TABLE $tableName

NoSuchMethodError: writing spark-streaming data to cassandra

2014-12-09 Thread m.sarosh
Hi, I am intending to save the streaming data from kafka into Cassandra, using spark-streaming: But there seems to be problem with line javaFunctions(data).writerBuilder(testkeyspace, test_table, mapToRow(TestTable.class)).saveToCassandra(); I am getting NoSuchMethodError. The code, the

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Back to the first question, this will mandate that hive is up and running? When I try it, I get the following exception. The documentation says that this method works only on SchemaRDD. I though that countries.saveAsTable did not work for that a reason so I created a tmp that contains the results

Re: Mllib native netlib-java/OpenBLAS

2014-12-09 Thread Jaonary Rabarisoa
+1 with 1.3-SNAPSHOT. On Mon, Dec 1, 2014 at 5:49 PM, agg212 alexander_galaka...@brown.edu wrote: Thanks for your reply, but I'm still running into issues installing/configuring the native libraries for MLlib. Here are the steps I've taken, please let me know if anything is incorrect. -

Using S3 block file system

2014-12-09 Thread Paul Colomiets
Hi, I'm trying to use S3 Block file system in spark, i.e. s3:// urls (*not* s3n://). And I always get the following error: Py4JJavaError: An error occurred while calling o3188.saveAsParquetFile. : org.apache.hadoop.fs.s3.S3FileSystemException: Not a Hadoop S3 file. at

Re: NoSuchMethodError: writing spark-streaming data to cassandra

2014-12-09 Thread Gerard Maas
You're using two conflicting versions of the connector: the Scala version at 1.1.0 and the Java version at 1.0.4. I don't use Java, but I guess you only need the java dependency for your job - and with the version fixed. dependency groupIdcom.datastax.spark/groupId

Stack overflow Error while executing spark SQL

2014-12-09 Thread jishnu.prathap
Hi I am getting Stack overflow Error Exception in main java.lang.stackoverflowerror scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at

Stack overflow Error while executing spark SQL

2014-12-09 Thread jishnu.prathap
Hi I am getting Stack overflow Error Exception in main java.lang.stackoverflowerror scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at

Re: spark 1.1.1 Maven dependency

2014-12-09 Thread Sean Owen
Are you using the Commons libs in your app? then you need to depend on them directly, and not rely on them accidentally being provided by Spark. There is no trial and error; you must declare all the dependencies you use in your own code. Otherwise perhaps you are not actually running with the

Re: Saving Data only if Dstream is not empty

2014-12-09 Thread Sean Owen
I don't believe you can do this unless you implement the save to HDFS logic yourself. To keep the semantics consistent, these saveAs* methods will always output a file per partition. On Mon, Dec 8, 2014 at 11:53 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi Experts! I want to save

reg JDBCRDD code

2014-12-09 Thread Deepa Jayaveer
Hi All, am new to Spark. I tried to connect to Mysql using Spark. want to write a code in Java but getting runtime exception. I guess that the issue is with the function0 and function1 objects being passed in JDBCRDD . I tried my level best and attached the code, can you please help us to

registerTempTable: Table not found

2014-12-09 Thread Hao Ren
Hi, I am using Spark SQL on 1.2.1-snapshot. Here is problem I encountered. Bacially, I want to save a schemaRDD to HiveContext val scm = StructType( Seq( StructField(name, StringType, nullable = false), StructField(cnt, IntegerType, nullable = false) )) val schRdd

Help realted with spark streaming usage error

2014-12-09 Thread Saurabh Pateriya
Hi , I am running spark streaming in standalone mode on twitter source into single machine(using HDP virtual box) I receive status from streaming context and I can print the same but when I try to save those statuses as RDD into Hadoop using rdd.saveAsTextFiles or

Re: registerTempTable: Table not found

2014-12-09 Thread nitin
Looks like this issue has been fixed very recently and should be available in next RC :- http://apache-spark-developers-list.1001551.n3.nabble.com/CREATE-TABLE-AS-SELECT-does-not-work-with-temp-tables-in-1-2-0-td9662.html -- View this message in context:

Specifying number of executors in Mesos

2014-12-09 Thread Gerard Maas
Hi, We've a number of Spark Streaming /Kafka jobs that would benefit of an even spread of consumers over physical hosts in order to maximize network usage. As far as I can see, the Spark Mesos scheduler accepts resource offers until all required Mem + CPU allocation has been satisfied. This

Re: registerTempTable: Table not found

2014-12-09 Thread Hao Ren
Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/registerTempTable-Table-not-found-tp20592p20594.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian
According to the stacktrace, you were still using SQLContext rather than HiveContext. To interact with Hive, HiveContext *must* be used. Please refer to this page http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On 12/9/14 6:26 PM, Anas Mosaad wrote: Back to the

Unable to start Spark 1.3 after building:java.lang. NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2014-12-09 Thread Daniel Haviv
Hi, I've built spark 1.3 with hadoop 2.6 but when I startup the spark-shell I get the following exception: 14/12/09 06:54:24 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/12/09 06:54:24 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 14/12/09

Submit application to spark on mesos cluster

2014-12-09 Thread Han JU
Hi, I have a little problem in submitting our application to a mesos cluster. Basically the mesos cluster is configured and I'm able to have spark-shell working correctly. Then I tried to launch our application jar (a uber, sbt assembly jar with all deps): bin/spark-submit --master

Re: Programmatically running spark jobs using yarn-client

2014-12-09 Thread Aniket Bhatnagar
Thanks Akhil. I was wondering why it isn't available to find the class even though it existed in the same class loader as SparkContext. As a workaround, I used the following code the add all dependent jars in a playframework application to spark context. @tailrec private def

Re: Saving Data only if Dstream is not empty

2014-12-09 Thread Gerard Maas
We have a similar case in which we don't want to save data to Cassandra if the data is empty. In our case, we filter the initial DStream to process messages that go to a given table. To do so, we're using something like this: dstream.foreachRDD{ (rdd,time) = tables.foreach{ table = val

Re: Spark on YARN memory utilization

2014-12-09 Thread Denny Lee
Thanks Sandy! On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote: Another thing to be aware of is that YARN will round up containers to the nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults to 1024. -Sandy On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee

RE: Learning rate or stepsize automation

2014-12-09 Thread Bui, Tri
Thanks! Will try it out. From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: Monday, December 08, 2014 5:13 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Learning rate or stepsize automation Hi Bui, Please use BFGS based solvers...For BFGS you don't have to specify step size

Re: NullPointerException When Reading Avro Sequence Files

2014-12-09 Thread Simone Franzini
Hi Cristovao, I have seen a very similar issue that I have posted about in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html I think your main issue here is somewhat similar, in that the MapWrapper Scala class is not registered. This gets registered

Re: reg JDBCRDD code

2014-12-09 Thread Akhil Das
Hi Deepa, In Scala, You will do something like https://gist.github.com/akhld/ccafb27f098163bea622 With Java API's it will be something like https://gist.github.com/akhld/0d9299aafc981553bc34 Thanks Best Regards On Tue, Dec 9, 2014 at 6:39 PM, Deepa Jayaveer deepa.jayav...@tcs.com wrote: Hi

Re: NullPointerException When Reading Avro Sequence Files

2014-12-09 Thread Simone Franzini
You can use this Maven dependency: dependency groupIdcom.twitter/groupId artifactIdchill-avro/artifactId version0.4.0/version /dependency Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro

Re: PySpark elasticsearch question

2014-12-09 Thread Mohamed Lrhazi
Thanks Nick... still no luck. If I use ?q=somerandomcharsfields=title,_source I get an exception about empty collection, which seems to indicate it is actually using the supplied es.query, but somehow when I do rdd.take(1) or take(10), all I get is the id and an empty dict, apparently... maybe

Re: PySpark elasticsearch question

2014-12-09 Thread Mohamed Lrhazi
found a format that worked, kind of accidentally: es.query : {query:{match_all:{}},fields:[title,_source]} Thanks, Mohamed. On Tue, Dec 9, 2014 at 11:27 AM, Mohamed Lrhazi mohamed.lrh...@georgetown.edu wrote: Thanks Nick... still no luck. If I use ?q=somerandomcharsfields=title,_source

Re: NoSuchMethodError: writing spark-streaming data to cassandra

2014-12-09 Thread m.sarosh
Hi, @Gerard- Thanks, i added one more dependency for conf.set(spark.cassandra.connection.host, localhost). But now, i am able to create a connection, but the data is not getting inserted into the cassandra table. and the logs show its getting connected and the next second getting

pyspark sc.textFile uses only 4 out of 32 threads per node

2014-12-09 Thread Gautham
I am having an issue with pyspark launched in ec2 (using spark-ec2) with 5 r3.4xlarge machines where each has 32 threads and 240GB of RAM. When I do sc.textFile to load data from a number of gz files, it does not progress as fast as expected. When I log-in to a child node and run top, I see only 4

Re: spark sql - save to Parquet file - Unsupported datatype TimestampType

2014-12-09 Thread Michael Armbrust
Not yet unfortunately. You could cast the timestamp to a long if you don't need nanosecond precision. Here is a related JIRA: https://issues.apache.org/jira/browse/SPARK-4768 On Mon, Dec 8, 2014 at 11:27 PM, ZHENG, Xu-dong dong...@gmail.com wrote: I meet the same issue. Any solution? On

PySprak and UnsupportedOperationException

2014-12-09 Thread Mohamed Lrhazi
While trying simple examples of PySpark code, I systematically get these failures when I try this.. I dont see any prior exceptions in the output... How can I debug further to find root cause? es_rdd = sc.newAPIHadoopRDD( inputFormatClass=org.elasticsearch.hadoop.mr.EsInputFormat,

Re: PhysicalRDD problem?

2014-12-09 Thread Michael Armbrust
val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, existingSchemaRDD.schema) This line is throwing away the logical information about existingSchemaRDD and thus Spark SQL can't know how to push down projections or predicates past this operator. Can you describe more the problems

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-09 Thread Michael Armbrust
You might also try out the recently added support for views. On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Ah... I see. Thanks for pointing it out. Then it means we cannot mount external table using customized column names. hmm... Then the only option left is

yarn log on EMR

2014-12-09 Thread Tyson
Dear all, I would like to run a simple spark job on EMR with yarn. My job is the follows: public voidEMRRun() { SparkConf sparkConf =newSparkConf().setAppName(RunEMR).setMaster(yarn-cluster); sparkConf.set(spark.executor.memory,13000m); JavaSparkContext ctx

Caching RDDs with shared memory - bug or feature?

2014-12-09 Thread insperatum
If all RDD elements within a partition contain pointers to a single shared object, Spark persists as expected when the RDD is small. However, if the RDD is more than *200 elements* then Spark reports requiring much more memory than it actually does. This becomes a problem for large RDDs, as Spark

spark shell and hive context problem

2014-12-09 Thread minajagi
Hi I'm working on Spark that comes with CDH 5.2.0 I'm trying to get a hive context in the shell and I'm running into problems that I don't understand. I have added hive-site.xml to the conf folder under /usr/lib/spark/conf as indicated elsewhere Here is what I see.Pls help

Re: spark shell and hive context problem

2014-12-09 Thread Marcelo Vanzin
Hello, In CDH 5.2 you need to manually add Hive classes to the classpath of your Spark job if you want to use the Hive integration. Also, be aware that since Spark 1.1 doesn't really support the version of Hive shipped with CDH 5.2, this combination is to be considered extremely experimental. On

spark shell session crashes when trying to obtain hive context

2014-12-09 Thread minajagi
Hi I'm working on Spark that comes with CDH 5.2.0 I'm trying to get a hive context in the shell and I'm running into problems that I don't understand. I have added hive-site.xml to the conf folder under /usr/lib/spark/conf as indicated elsewhere Here is what I see.Pls help

implement query to sparse vector representation in spark

2014-12-09 Thread Huang,Jin
I know quite a lot about machine learning, but new to scala and spark. Got stuck due to Spark API, so please advise. I have a txt file with each line format like this #label \t # query, a strong of words, delimited by space 1 wireless amazon kindle 2 apple iPhone 5 1 kindle fire 8G 2

equivalent to sql in

2014-12-09 Thread dizzy5112
i have and RDD i want to filter and for a single term all works good: ie dataRDD.filter(x=x._2 ==apple) how can i use multiple values, for example if i wanted to filter my rdd to take out apples and oranges and pears with out using . This could get long winded as there may be quite a few. Can

Re: equivalent to sql in

2014-12-09 Thread Malte
This is more a scala specific question. I would look at the List contains implementation -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/equivalent-to-sql-in-tp20599p20600.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: query classification using spark Mlib

2014-12-09 Thread sparkEbay
the format is bad, the question link is here http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib -- View this message in context:

RE: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-12-09 Thread Judy Nash
To report back how I ultimately solved this issue and someone else can do: 1) Check each jar class path and make sure the jars are listed in the order of Guava class version (i.e. spark-assembly needs to list before Hadoop 2.4 because spark-assembly has guava 14 and Hadoop 2.4 has guava 11).

Fwd: Please add us to the Spark users page

2014-12-09 Thread Abhik Majumdar
Hi, My name is Abhik Majumdar and I am a co-founder of Vidora Corp. We use Spark at Vidora to power our machine learning stack and we are requesting to be included on your Powered by Spark page: https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark Here is the information you

RE: equivalent to sql in

2014-12-09 Thread Mohammed Guller
Option 1: dataRDD.filter(x=(x._2 ==apple) || (x._2 ==orange)) Option 2: val fruits = Set(apple, orange, pear) dataRDD.filter(x=fruits.contains(x._2)) Mohammed -Original Message- From: dizzy5112 [mailto:dave.zee...@gmail.com] Sent: Tuesday, December 9, 2014 2:16 PM To:

Re: dockerized spark executor on mesos?

2014-12-09 Thread Venkat Subramanian
We have dockerized Spark Master and worker(s) separately and are using it in our dev environment. We don't use Mesos though, running it in Standalone mode, but adding Mesos should not be that difficult I think. Regards Venkat -- View this message in context:

Cluster getting a null pointer error

2014-12-09 Thread Eric Tanner
I have set up a cluster on AWS and am trying a really simple hello world program as a test. The cluster was built using the ec2 scripts that come with Spark. Anyway, I have output the error message (using --verbose) below. The source code is further below that. Any help would be greatly

Reading Yarn log on EMR

2014-12-09 Thread Nagy István
Dear all, I would like to run a simple spark job on EMR with yarn. My job is the follows: public voidEMRRun() { SparkConf sparkConf =newSparkConf().setAppName(RunEMR).setMaster(yarn-cluster); sparkConf.set(spark.executor.memory,13000m); JavaSparkContext ctx

Re: Reading Yarn log on EMR

2014-12-09 Thread Adam Diaz
https://issues.apache.org/jira/browse/YARN-321 There is not a generic application history server yet. The current one works for MR. On Tue, Dec 9, 2014 at 4:48 PM, Nagy István tyson...@gmail.com wrote: Dear all, I would like to run a simple spark job on EMR with yarn. My job is the

Workers keep dying on EC2 Spark cluster: PriviledgedActionException

2014-12-09 Thread Jeff Schecter
Hi Spark users, I've been attempting to get flambo https://github.com/yieldbot/flambo/blob/develop/README.md, a Clojure library for Spark, working with my codebase. After getting things to build with this very simple interface: (ns sharknado.core (:require [flambo.conf :as conf]

Can HiveContext be used without using Hive?

2014-12-09 Thread Manoj Samel
From 1.1.1 documentation, it seems one can use HiveContext instead of SQLContext without having a Hive installation. The benefit is richer SQL dialect. Is my understanding correct ? Thanks

Re: Can HiveContext be used without using Hive?

2014-12-09 Thread Michael Armbrust
That is correct. It the hive context will create an embedded metastore in the current directory if you have not configured hive. On Tue, Dec 9, 2014 at 5:51 PM, Manoj Samel manojsamelt...@gmail.com wrote: From 1.1.1 documentation, it seems one can use HiveContext instead of SQLContext without

Re: Can HiveContext be used without using Hive?

2014-12-09 Thread Anas Mosaad
In that case, what should be the behavior of saveTableAs? On Dec 10, 2014 4:03 AM, Michael Armbrust mich...@databricks.com wrote: That is correct. It the hive context will create an embedded metastore in the current directory if you have not configured hive. On Tue, Dec 9, 2014 at 5:51 PM,

RE: Can HiveContext be used without using Hive?

2014-12-09 Thread Cheng, Hao
It works exactly like Create Table As Select (CTAS) in Hive. Cheng Hao From: Anas Mosaad [mailto:anas.mos...@incorta.com] Sent: Wednesday, December 10, 2014 11:59 AM To: Michael Armbrust Cc: Manoj Samel; user@spark.apache.org Subject: Re: Can HiveContext be used without using Hive? In that

Mllib error

2014-12-09 Thread amin mohebbi
I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program:Object mllib is not a member of package org.apache.spark please note I just migrated from 1.0.2 to 1.1.1   Best Regards

Re: PhysicalRDD problem?

2014-12-09 Thread Nitin Goyal
Hi Michael, I think I have found the exact problem in my case. I see that we have written something like following in Analyzer.scala :- // TODO: pass this in as a parameter. val fixedPoint = FixedPoint(100) and Batch(Resolution, fixedPoint, ResolveReferences ::

Stack overflow Error while executing spark SQL

2014-12-09 Thread Jishnu Prathap
HI I am getting Stack overflow Error Exception in main java.lang.stackoverflowerror scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at

Re: reg JDBCRDD code

2014-12-09 Thread Deepa Jayaveer
Thanks Akhil but it is expecting Function1 instead of Function .. I tried out writing a new class by implementing Function1 but got an error . can you please help us to get it resolved JDBCRDD is created as JdbcRDD rdd = new JdbcRDD(sc, getConnection, sql, 0, 0, 1, getResultset,

Re: PySprak and UnsupportedOperationException

2014-12-09 Thread Mohamed Lrhazi
somehow I now get a better error trace, I believe for the same root issue... any advice of how to narrow this down further highly appreciated: ... 14/12/10 07:15:03 ERROR PythonRDD: Python worker exited unexpectedly (crashed) org.apache.spark.api.python.PythonException: Traceback (most recent

Re: reg JDBCRDD code

2014-12-09 Thread Akhil Das
Try changing this line *JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 0, 1, getResultset, ClassTag$.*MODULE$*.apply(String.*class*)); to *JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 100, 1, getResultset,

Re: reg JDBCRDD code

2014-12-09 Thread Deepa Jayaveer
Hi Akhil, Getting the same error . I guess that the issue on Function1 implementation. is it enough if we override apply method in Function1 class? Thanks Deepa From: Akhil Das ak...@sigmoidanalytics.com To: Deepa Jayaveer deepa.jayav...@tcs.com Cc: user@spark.apache.org

Re: PhysicalRDD problem?

2014-12-09 Thread Nitin Goyal
I see that somebody had already raised a PR for this but it hasn't been merged. https://issues.apache.org/jira/browse/SPARK-4339 Can we merge this in next 1.2 RC? Thanks -Nitin On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal nitin2go...@gmail.com wrote: Hi Michael, I think I have found the

Re: PySprak and UnsupportedOperationException

2014-12-09 Thread Davies Liu
On Tue, Dec 9, 2014 at 11:32 AM, Mohamed Lrhazi mohamed.lrh...@georgetown.edu wrote: While trying simple examples of PySpark code, I systematically get these failures when I try this.. I dont see any prior exceptions in the output... How can I debug further to find root cause? es_rdd =

Re: Spark SQL Stackoverflow error

2014-12-09 Thread Jishnu Prathap
Hi, Unfortunately I am also getting the same error Anybody solved it??.. Exception in main java.lang.stackoverflowerror scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)