Re: orc read issue n spark

2015-11-18 Thread Reynold Xin
What do you mean by starts delay scheduling? Are you saying it is no longer doing local reads? If that's the case you can increase the spark.locality.read timeout. On Wednesday, November 18, 2015, Renu Yadav wrote: > Hi , > I am using spark 1.4.1 and saving orc file using >

Re: How to Add builtin geometry type to SparkSQL?

2015-11-18 Thread Reynold Xin
Have you looked into https://github.com/harsha2010/magellan ? On Wednesday, November 18, 2015, ddcd wrote: > Hi all, > > I'm considering adding geometry type to SparkSQL. > > I know that there is a project named sparkGIS > which is an

Re: Support for local disk columnar storage for DataFrames

2015-11-18 Thread Cristian O
Hi, While these OSS efforts are interesting, they're for now quite unproven. Personally would be much more interested in seeing Spark incrementally moving towards supporting updating DataFrames on various storage substrates, and first of all locally, perhaps as an extension of cached DataFrames.

Spark Summit East 2016 CFP - Closing in 5 days

2015-11-18 Thread Scott walent
Hi Spark Devs and Users, The CFP for Spark Summit East 2016 (https://spark-summit.org) is closing this weekend. As the leading event for Apache Spark, this is the chance to both share key learnings and to gain insights from the creators of Spark, developers, vendors and peers who are using Spark.

How to Add builtin geometry type to SparkSQL?

2015-11-18 Thread ddcd
Hi all, I'm considering adding geometry type to SparkSQL. I know that there is a project named sparkGIS which is an add-on of sparkSQL. The project uses user defined types and user defined functions. But I think a built-in type will be better in that we

Re: A proposal for Spark 2.0

2015-11-18 Thread Mark Hamstra
Ah, got it; by "stabilize" you meant changing the API, not just bug fixing. We're on the same page now. On Wed, Nov 18, 2015 at 3:39 PM, Kostas Sakellis wrote: > A 1.6.x release will only fix bugs - we typically don't change APIs in z > releases. The Dataset API is

RE: SequenceFile and object reuse

2015-11-18 Thread jeff saremi
You're not seeing the issue because you perform one additional "map". map{case (k,v) => (k.get(), v.toString)}Instead of being able to use the read Text you had to create a tuple (single) out of the string of the text. That is exactly why I asked this question.Why do we have t do this

FW: SequenceFile and object reuse

2015-11-18 Thread jeff saremi
I sent this to the user forum. I got no responses. Could someone here please help? thanks jeff From: jeffsar...@hotmail.com To: u...@spark.apache.org Subject: SequenceFile and object reuse Date: Fri, 13 Nov 2015 13:29:58 -0500 So we tried reading a sequencefile in Spark and realized that all

Hash Partitioning & Sort Merge Join

2015-11-18 Thread gsvic
In case of Sort Merge join in which a shuffle (exchange) will be performed, I have the following questions (Please correct me if my understanding is not correct): Let's say that relation A is a JSONRelation (640 MB) on the HDFS where the block size is 64MB. This will produce a Scan

Re: FW: SequenceFile and object reuse

2015-11-18 Thread Jeff Zhang
Would this be an issue on the raw data ? I use the following simple code, and don't hit the issue you mentioned. Or it would be better to share your code. val rdd =sc.sequenceFile("/Users/hadoop/Temp/Seq", classOf[IntWritable], classOf[Text]) rdd.map{case (k,v) => (k.get(), v.toString)}.collect()