Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
Unfortunately I cannot at this moment (not a decision I can make) :( On Wed, Jan 20, 2016 at 6:46 PM Ted Yu wrote: > I am not aware of a workaround. > > Can you upgrade to 0.98.4+ release ? > > Cheers > > On Wed, Jan 20, 2016 at 6:26 PM, Ajinkya Kale

--driver-java-options not support multiple JVM configuration ?

2016-01-20 Thread our...@cnsuning.com
hi all; --driver-java-options not support multiple JVM configuration. the submot as following: Cores=16 sparkdriverextraJavaOptions="-XX:newsize=2096m -XX:MaxPermSize=512m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseP arNewGC -XX:+UseConcMarkSweepGC

HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
I have posted this on hbase user list but i thought makes more sense on spark user list. I am able to read the table in yarn-client mode from spark-shell but I have exhausted all online forums for options to get it working in the yarn-cluster mode through spark-submit. I am using this

Re: Re: --driver-java-options not support multiple JVM configuration ?

2016-01-20 Thread our...@cnsuning.com
Marcelo, error also exists with quotes around "$sparkdriverextraJavaOptions": Unrecognized VM option

Re: Spark + Sentry + Kerberos don't add up?

2016-01-20 Thread Ruslan Dautkhanov
I took liberty and created a JIRA https://github.com/cloudera/livy/issues/36 Feel free to close it if doesn't belong to Livy project. I really don't know if this is a Spark or a Livy/Sentry problem. Any ideas for possible workarounds? Thank you. -- Ruslan Dautkhanov On Mon, Jan 18, 2016 at

Re: spark task scheduling delay

2016-01-20 Thread Renu Yadav
Any suggestions? On Wed, Jan 20, 2016 at 6:50 PM, Renu Yadav wrote: > Hi , > > I am facing spark task scheduling delay issue in spark 1.4. > > suppose I have 1600 tasks running then 1550 tasks runs fine but for the > remaining 50 i am facing task delay even if the input

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ted Yu
0.98.0 didn't have fix from HBASE-8 Please upgrade your hbase version and try again. If still there is problem, please pastebin the stack trace. Thanks On Wed, Jan 20, 2016 at 5:41 PM, Ajinkya Kale wrote: > > I have posted this on hbase user list but i thought

Re: Window Functions importing issue in Spark 1.4.0

2016-01-20 Thread satish chandra j
Hi Ted, Thanks for sharing the link on rowNumber example on usage Could you please let me know if I could use rowNumber window function in my currenct Spark 1.4.0 version If yes, than why am I getting error in "import org.apache.spark.sql. expressions.Window" and "import

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
Hi Ted, Thanks for responding. Is there a work around for 0.98.0 ? Adding the hbase-protocol jar to HADOOP_CLASSPATH didnt work for me. On Wed, Jan 20, 2016 at 6:14 PM Ted Yu wrote: > 0.98.0 didn't have fix from HBASE-8 > > Please upgrade your hbase version and try

Re: --driver-java-options not support multiple JVM configuration ?

2016-01-20 Thread Marcelo Vanzin
On Wed, Jan 20, 2016 at 7:38 PM, our...@cnsuning.com wrote: > --driver-java-options $sparkdriverextraJavaOptions \ You need quotes around "$sparkdriverextraJavaOptions". -- Marcelo - To unsubscribe,

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ted Yu
I am not aware of a workaround. Can you upgrade to 0.98.4+ release ? Cheers On Wed, Jan 20, 2016 at 6:26 PM, Ajinkya Kale wrote: > Hi Ted, > > Thanks for responding. > Is there a work around for 0.98.0 ? Adding the hbase-protocol jar to > HADOOP_CLASSPATH didnt work for

best practice : how to manage your Spark cluster ?

2016-01-20 Thread charles li
I've put a thread before: pre-install 3-party Python package on spark cluster currently I use *Fabric* to manage my cluster , but it's not enough for me, and I believe there is a much better way to *manage and monitor* the cluster. I believe there really exists some open source manage tools

??????retrieve cell value from a rowMatrix.

2016-01-20 Thread zhangjp
use apply(i,j) function. can u know how to save matrix to a file using java language? -- -- ??: "Srivathsan Srinivas";; : 2016??1??21??(??) 9:04 ??: "user"; :

Re: visualize data from spark streaming

2016-01-20 Thread Silvio Fiorito
You’ve got a few options: * Use a notebook tool such as Zeppelin, Jupyter, or Spark Notebook to write up some visualizations which update in time with your streaming batches * Use Spark Streaming to push your batch results to another 3rd-party system with a BI tool that supports

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
Hi Robin, I am using Spark 1.3 and I am not able to find the api Graph.fromEdgeTuples(edge RDD, 1) Regards, Praveen Well you can use a similar tech to generate an RDD[(Long, Long)] (that’s what the edges variable is) and then create the Graph using Graph.fromEdgeTuples.

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
Sorry.. Found the api.. On 21 Jan 2016 10:17, "praveen S" wrote: > Hi Robin, > > I am using Spark 1.3 and I am not able to find the api > Graph.fromEdgeTuples(edge RDD, 1) > > Regards, > Praveen > Well you can use a similar tech to generate an RDD[(Long, Long)] (that’s >

Re: Parquet write optimization by row group size config

2016-01-20 Thread Akhil Das
It would be good if you can share the code, someone here or I can guide you better if you can post the code snippet. Thanks Best Regards On Wed, Jan 20, 2016 at 10:54 PM, Pavel Plotnikov < pavel.plotni...@team.wrike.com> wrote: > Thanks, Akhil! It helps, but this jobs still not fast enough,

Re: a lot of warnings when build spark 1.6.0

2016-01-20 Thread Eli Super
Thanks Sean in command : mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Dscala-2.10 -Phive -Phive-thriftserver -DskipTests clean package is the sting : -Phadoop-2.4 -Dhadoop.version=2.4.0 kind of duplication ? can I use only one string defines hadoop version ? and I don't have hadoop , I

Re: spark task scheduling delay

2016-01-20 Thread Stephen Boesch
Which Resource Manager are you using? 2016-01-20 21:38 GMT-08:00 Renu Yadav : > Any suggestions? > > On Wed, Jan 20, 2016 at 6:50 PM, Renu Yadav wrote: > >> Hi , >> >> I am facing spark task scheduling delay issue in spark 1.4. >> >> suppose I have 1600

a lot of warnings when build spark 1.6.0

2016-01-20 Thread Eli Super
Hi I get WARNINGS when try to build spark 1.6.0 overall I get SUCCESS message on all projects command I used : mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Dscala-2.10 -Phive -Phive-thriftserver -DskipTests clean package from pom.xml 2.10.5 2.10 example of warnings : [INFO]

Re: How to call a custom function from GroupByKey which takes Iterable[Row] as input and returns a Map[Int,String] as output in scala

2016-01-20 Thread Neha Mehta
Hi Vishal, Thanks for the solution. I was able to get it working for my scenario. Regarding the Task not serializable error, I still get it when I declare a function outside the main method. However, if I declare it inside the main "val func = {}", it is working fine for me. In case you have any

Re: a lot of warnings when build spark 1.6.0

2016-01-20 Thread Sean Owen
These are just warnings. Most are unavoidable given the version of Hadoop supported vs what you build with. On Thu, Jan 21, 2016, 08:08 Eli Super wrote: > Hi > > I get WARNINGS when try to build spark 1.6.0 > > overall I get SUCCESS message on all projects > > command I

Re: Parquet write optimization by row group size config

2016-01-20 Thread Jörn Franke
What is your data size, the algorithm and the expected time? Depending on this the group can recommend you optimizations or tell you that the expectations are wrong > On 20 Jan 2016, at 18:24, Pavel Plotnikov > wrote: > > Thanks, Akhil! It helps, but this jobs

Re: trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem

2016-01-20 Thread Andy Davidson
Very Nice!. Many thanks Kevin. I wish I found this out a couple of weeks ago. Andy From: Kevin Mellott Date: Wednesday, January 20, 2016 at 4:34 PM To: Andrew Davidson Cc: "user @spark" Subject: Re: trouble

retrieve cell value from a rowMatrix.

2016-01-20 Thread Srivathsan Srinivas
Hi, Is there a way to retrieve the cell value of a rowMatrix? Like m(i,j)? The docs say that the indices are long. Maybe I am doing something wrong...but, there doesn't seem to be any such direct method. Any suggestions? -- Thanks, Srini.

Getting all field value as Null while reading Hive Table with Partition

2016-01-20 Thread Bijay Pathak
Hello, I am getting all the value of field as NULL while reading Hive Table with Partition in SPARK 1.5.0 running on CDH5.5.1 with YARN (Dynamic Allocation). Below is the command I used in Spark_Shell: import org.apache.spark.sql.hive.HiveContext val hiveContext = new HiveContext(sc) val

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Raghu Ganti
Ah, OK! I am a novice to Scala - will take a look at Scala case classes. It would be awesome if you can provide some pointers. Thanks, Raghu On Wed, Jan 20, 2016 at 12:25 PM, Andy Grove wrote: > I'm talking about implementing CustomerRecord as a scala case class, >

Re: Parquet write optimization by row group size config

2016-01-20 Thread Pavel Plotnikov
Thanks, Akhil! It helps, but this jobs still not fast enough, maybe i missed something Regards, Pavel On Wed, Jan 20, 2016 at 9:51 AM Akhil Das wrote: > Did you try re-partitioning the data before doing the write? > > Thanks > Best Regards > > On Tue, Jan 19, 2016

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
The analog to PairRDD is a GroupedDataset (created by calling groupBy), which offers similar functionality, but doesn't require you to construct new object that are in the form of key/value pairs. It doesn't matter if they are complex objects, as long as you can create an encoder for them

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
Thanks - this helps a lot except for the issue of looking at schools in neighboring regions On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust wrote: > The analog to PairRDD is a GroupedDataset (created by calling groupBy), > which offers similar functionality, but

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Andy Grove
Catalyst is expecting a class that implements scala.Row or scala.Product and is instead finding a Java class. I've run into this issue a number of times. Dataframe doesn't work so well with Java. Here's a blog post with more information on this:

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Andy Grove
I'm talking about implementing CustomerRecord as a scala case class, rather than as a Java class. Scala case classes implement the scala.Product trait, which Catalyst is looking for. Thanks, Andy. -- Andy Grove Chief Architect AgilData - Simple Streaming SQL that Scales www.agildata.com On

Re: Using Spark, SparkR and Ranger, please help.

2016-01-20 Thread Ted Yu
The tail of the stack trace seems to be chopped off. Can you include the whole trace ? Which version of Spark / Hive / Ranger are you using ? Cheers On Wed, Jan 20, 2016 at 9:42 AM, Julien Carme wrote: > Hello, > > I have been able to use Spark with Apache Ranger. I

How to debug join operations on a cluster.

2016-01-20 Thread Borislav Iordanov
Hi, I'm reading data from HBase using the latest (2.0.0-SNAPSHOT) Hbase-Spark integration module. HBase is deployed on a cluster of 3 machines and Spark is deployed as a Standalone cluster on the same machines. I am doing a join between two JavaPairRDDs that are constructed from two

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Andy Grove
I would walk through a Spark tutorial in Scala. It will be the best way to learn this. In brief though, a Scala case class is like a Java bean / pojo but has a more concise syntax (no getters/setters). case class Person(firstName: String, lastName: String, age: Int) Thanks, Andy. -- Andy

Using Spark, SparkR and Ranger, please help.

2016-01-20 Thread Julien Carme
Hello, I have been able to use Spark with Apache Ranger. I had the right configuration files to Spark conf, I add Ranger jars to the classpath and it works, Spark complies to Ranger rules when I access Hive tables. However with SparkR it does not work, which is rather surprising considering

Re: How to use scala.math.Ordering in java

2016-01-20 Thread Ted Yu
Please take a look at the following files for some examples: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java Cheers On Wed, Jan 20, 2016 at 1:03 AM, ddav

Looking for the best tool that support structured DB and fast text indexing and searching with Spark

2016-01-20 Thread Khaled Al-Gumaei
Hello, I would like to do some calculations of *Term Frequency* and *Document Frequency* using spark. BUT, I need my input to be from a database table (rows and columns) and the output also to database table. Which kind of tool I would use for the purpose of ( *supporting DB tables* and *fast

Re: updateStateByKey not persisting in Spark 1.5.1

2016-01-20 Thread Shixiong(Ryan) Zhu
Could you share your log? On Wed, Jan 20, 2016 at 7:55 AM, Brian London wrote: > I'm running a streaming job that has two calls to updateStateByKey. When > run in standalone mode both calls to updateStateByKey behave as expected. > When run on a cluster, however, it

Dataframe, Spark SQL - Drops First 8 Characters of String on Amazon EMR

2016-01-20 Thread awzurn
Hello, I'm doing some work on Amazon's EMR cluster, and am noticing some peculiar results when using both DataFrames to procure and operate on data, and also when using Spark SQL within Zeppelin to run graphs/reports. Particularly, I'm noticing that when using either of these on the EMR running

updateStateByKey not persisting in Spark 1.5.1

2016-01-20 Thread Brian London
I'm running a streaming job that has two calls to updateStateByKey. When run in standalone mode both calls to updateStateByKey behave as expected. When run on a cluster, however, it appears that the first call is not being checkpointed as shown in this DAG image: http://i.imgur.com/zmQ8O2z.png

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Raghu Ganti
Case classes where? On Wed, Jan 20, 2016 at 12:21 PM, Andy Grove wrote: > Honestly, moving to Scala and using case classes is the path of least > resistance in the long term. > > > > Thanks, > > Andy. > > -- > > Andy Grove > Chief Architect > AgilData - Simple Streaming

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Raghu Ganti
Is it not internal to the Catalyst implementation? I should not be modifying the Spark source to get things to work, do I? :-) On Wed, Jan 20, 2016 at 12:21 PM, Raghu Ganti wrote: > Case classes where? > > On Wed, Jan 20, 2016 at 12:21 PM, Andy Grove

launching app using SparkLauncher

2016-01-20 Thread seemanto.barua
Hi, I have a question on org.apache.spark.launcher.SparkLauncher How is the JavaSparkContext made available to the main class. -regards Seemanto Barua PLEASE READ: This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information.

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Raghu Ganti
Thanks for your reply, Andy. Yes, that is what I concluded based on the Stack trace. The problem is stemming from Java implementation of generics, but I thought this will go away if you compiled against Java 1.8, which solves the issues of proper generic implementation. Any ideas? Also, are you

Re: using spark context in map funciton TASk not serilizable error

2016-01-20 Thread Giri P
method1 looks like this reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir) reRDD has userId's def method1(sc:SparkContext , userId: string){ sc.cassandraTable("Keyspace", "Table2").where("userid = ?" userId) ...do something return "Test" } On Wed, Jan 20, 2016 at 11:00 AM,

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
Yeah, that tough. Perhaps you could do something like a flatMap and emit multiple virtual copies of each student for each region that is neighboring their actual region. On Wed, Jan 20, 2016 at 10:50 AM, Steve Lewis wrote: > Thanks - this helps a lot except for the issue

Re: Container exited with a non-zero exit code 1-SparkJOb on YARN

2016-01-20 Thread Shixiong(Ryan) Zhu
Could you share your log? On Wed, Jan 20, 2016 at 5:37 AM, Siddharth Ubale < siddharth.ub...@syncoms.com> wrote: > > > Hi, > > > > I am running a Spark Job on the yarn cluster. > > The spark job is a spark streaming application which is reading JSON from > a kafka topic , inserting the JSON

Re: updateStateByKey not persisting in Spark 1.5.1

2016-01-20 Thread Ted Yu
This is related: SPARK-6847 FYI On Wed, Jan 20, 2016 at 7:55 AM, Brian London wrote: > I'm running a streaming job that has two calls to updateStateByKey. When > run in standalone mode both calls to updateStateByKey behave as expected. > When run on a cluster,

Re: using spark context in map funciton TASk not serilizable error

2016-01-20 Thread Shixiong(Ryan) Zhu
You should not use SparkContext or RDD directly in your closures. Could you show the codes of "method1"? Maybe you only needs join or something else. E.g., val cassandraRDD = sc.cassandraTable("keySpace", "tableName") reRDD.join(cassandraRDD).map().saveAsTextFile(outputDir) On Tue, Jan 19,

visualize data from spark streaming

2016-01-20 Thread patcharee
Hi, How to visualize realtime data (in graph/chart) from spark streaming? Any tools? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: visualize data from spark streaming

2016-01-20 Thread Vinay Shukla
Or you can use Zeppelin notebook to visualize Spark Streaming. See https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2hvcnRvbndvcmtzLWdhbGxlcnkvemVwcGVsaW4tbm90ZWJvb2tzL21hc3Rlci8yQjUyMlYzWDgvbm90ZS5qc29u and other examples

Cache table as

2016-01-20 Thread Younes Naguib
Hi all, I'm connected to the thrift server using beeline on Spark 1.6. I used : cache table tbl as select * from table1 I see table1 in the storage memory. I can use it. But when I reconnect, I cant quert it anymore. I get : Error: org.apache.spark.sql.AnalysisException: Table not found:

Re: Scala MatchError in Spark SQL

2016-01-20 Thread Andy Grove
Honestly, moving to Scala and using case classes is the path of least resistance in the long term. Thanks, Andy. -- Andy Grove Chief Architect AgilData - Simple Streaming SQL that Scales www.agildata.com On Wed, Jan 20, 2016 at 10:19 AM, Raghu Ganti wrote: > Thanks

I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
We have been working a large search problem which we have been solving in the following ways. We have two sets of objects, say children and schools. The object is to find the closest school to each child. There is a distance measure but it is relatively expensive and would be very costly to apply

Re: trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem

2016-01-20 Thread Andy Davidson
For clarity callUDF() is not defined on DataFrames. It is defined on org.apache.spark.sql.functions . Strange the class name starts with lower case. I have not figure out how to use function class. http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions. html Andy From:

trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem

2016-01-20 Thread Andy Davidson
I am using 1.6.0. I am having trouble implementing a custom transformer derived from org.apache.spark.ml.Transformer in Java that I can use in a PipeLine. So far the only way I figure out how to implement any kind of complex functionality and have it applied to a DataFrame is to implement a UDF.

Re: trouble implementing complex transformer in java that can be used with Pipeline. Scala to Java porting problem

2016-01-20 Thread Kevin Mellott
Hi Andy, According to the API documentation for DataFrame , you should have access to *sqlContext* as a property off of the DataFrame instance. In your example, you could then do something like:

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-20 Thread fightf...@163.com
OK. I am trying to use the jdbc read datasource with predicate like the following : sqlContext.read.jdbc(url, table, Array("foo = 1", "foo = 3"), props) I can see that the task goes to 62 partitions. But I still get exception and the parquet file did not write successfully. Do I need to

Re: spark-1.2.0--standalone-ha-zookeeper

2016-01-20 Thread Paul Leclercq
Hi Raghvendra and Spark users, I also have trouble activating my stand by master when my first master is shutdown (via a ./sbin/stop-master.sh or via a instance shut down) and just want to share with you my thoughts. To answer your question Raghvendra, in *spark-env.sh*, if 2 IPs are set for

How to use scala.math.Ordering in java

2016-01-20 Thread ddav
Hi, I am writing my Spark application in java and I need to use a RangePartitioner. JavaPairRDD progRef1 = sc.textFile(programReferenceDataFile, 12).filter( (String s) -> !s.startsWith("#")).mapToPair(

Re: Redundant common columns of nature full outer join

2016-01-20 Thread Michael Armbrust
If you use the join that takes USING columns it should automatically coalesce (take the non null value from) the left/right columns: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L405 On Tue, Jan 19, 2016 at 10:51 PM, Zhong Wang

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-20 Thread 刘虓
Hi, I think you can view the spark job ui to find out whether the partition works or not,pay attention to the storage page to the partition size and which stage / task fails 2016-01-20 16:25 GMT+08:00 fightf...@163.com : > OK. I am trying to use the jdbc read datasource with

Re: Re: spark dataframe jdbc read/write using dbcp connection pool

2016-01-20 Thread fightf...@163.com
OK. I see there actually goes more partitions when I use predicate from the spark job ui. But each task then failed with the same error message : com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet successfully received from the server was

How to query data in tachyon with spark-sql

2016-01-20 Thread Sea
Hi,all I want to mount some hive table in tachyon, but I don't know how to query data in tachyon with spark-sql, who knows?

Re: spark-1.2.0--standalone-ha-zookeeper

2016-01-20 Thread Raghvendra Singh
Thanks Paul, So your reply prevented me from looking in the wrong direction, but I an back to my original problem with zookeeper. "Leadership had been revoked master shutting down" Can anyone provide some feedback or add to this. Regards Raghvendra On 20-Jan-2016 2:31 pm, "Paul Leclercq"

Container exited with a non-zero exit code 1-SparkJOb on YARN

2016-01-20 Thread Siddharth Ubale
Hi, I am running a Spark Job on the yarn cluster. The spark job is a spark streaming application which is reading JSON from a kafka topic , inserting the JSON values to hbase tables via Phoenix , ands then sending out certain messages to a websocket if the JSON satisfies a certain criteria.

Container exited with a non-zero exit code 1-SparkJOb on YARN

2016-01-20 Thread Siddharth Ubale
Hi, I am running a Spark Job on the yarn cluster. The spark job is a spark streaming application which is reading JSON from a kafka topic , inserting the JSON values to hbase tables via Phoenix , ands then sending out certain messages to a websocket if the JSON satisfies a certain criteria.

Re: Appending filename information to RDD initialized by sc.textFile

2016-01-20 Thread Femi Anthony
Thanks, I'll take a look. On Wed, Jan 20, 2016 at 1:38 AM, Akhil Das wrote: > You can use the sc.newAPIHadoopFile and pass your own InputFormat and > RecordReader which will read the compressed .gz files to your usecase. For > a start, you can look at the: > > -

Scala MatchError in Spark SQL

2016-01-20 Thread raghukiran
Hi, I created a custom UserDefinedType in Java as follows: SQLPoint = new UserDefinedType() { //overriding serialize, deserialize, sqlType, userClass functions here } When creating a dataframe, I am following the manual mapping, I have a constructor for JavaPoint - JavaPoint(double x, double y)