Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-12 Thread yash datta
Thanks for the prompt reply!. Opened a ticket here: https://issues.apache.org/jira/browse/SPARK-23056 BR Yash On Fri, Jan 12, 2018 at 3:41 PM, StanZhai wrote: > This problem was introduced by > which is designed to >

[SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread yash datta
Hi devs, Stumbled across an interesting problem with the parse_url function that has been implemented in spark in https://issues.apache.org/jira/browse/SPARK-16281 When using internationalized Domains in the urls like: val url = "http://правительство.рф "

Re: Dataframe Partitioning

2016-03-01 Thread yash datta
+1 This is one of the most common problems we encounter in our flow. Mark, I am happy to help if you would like to share some of the workload. Best Yash On Wednesday 2 March 2016, Mark Hamstra wrote: > I don't entirely agree. You're best off picking the right size

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread yash datta
+1 On Tue, Jan 5, 2016 at 1:57 PM, Jian Feng Zhang wrote: > +1 > > We use Python 2.7+ and 3.4+ to call PySpark. > > 2016-01-05 15:58 GMT+08:00 Kushal Datta : > >> +1 >> >> >> Dr. Kushal Datta >> Senior Research Scientist >> Big Data Research &

Re: KryoSerializer for closureSerializer in DAGScheduler

2015-08-31 Thread yash datta
support this. See > https://github.com/apache/spark/pull/6361 and > https://issues.apache.org/jira/browse/SPARK-7708 for some discussion of > the difficulties here. > > On Mon, Aug 31, 2015 at 3:44 AM, yash datta <sau...@gmail.com> wrote: > >> Hi devs, >> &g

KryoSerializer for closureSerializer in DAGScheduler

2015-08-31 Thread yash datta
inally created has 127 partitions Calling unioned.collect leads to serialization of UnionRDD. I am using spark 1.2.1 Any help regarding this will be highly appreciated. Best Yash Datta -- When events unfold with calm and ease When the winds that blow are merely breeze Learn from nature, from bi

Re: creating hive packages for spark

2015-04-27 Thread yash datta
Hi, you can build spark-project hive from here : https://github.com/pwendell/hive/tree/0.13.1-shaded-protobuf Hope this helps. On Mon, Apr 27, 2015 at 3:23 PM, Manku Timma manku.tim...@gmail.com wrote: Hello Spark developers, I want to understand the procedure to create the

Re: Stackoverflow in createDataFrame.

2015-04-24 Thread yash datta
This is already reported : https://issues.apache.org/jira/browse/SPARK-6999 On 24 Apr 2015 18:11, Jan-Paul Bultmann janpaulbultm...@me.com wrote: Hey, I get a stack overflow when calling the following method on SQLContext. def createDataFrame(rowRDD: JavaRDD[Row], columns:

Re: Building spark 1.2 from source requires more dependencies

2015-03-30 Thread yash datta
Hi all, When selecting large data in sparksql (Select * query) , I see Buffer overflow exception from kryo : 15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0 (TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 1, required: 2

Re: Spark SQL, Hive Parquet data types

2015-02-20 Thread yash datta
For the old parquet path (available in 1.2.1) , i made a few changes for being able to read/write to a table partitioned on timestamp type column https://github.com/apache/spark/pull/4469 On Fri, Feb 20, 2015 at 8:28 PM, The Watcher watche...@gmail.com wrote: 1. In Spark 1.3.0,