Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Yi Tian
Hi, Will We are planning to start implementing these functions. We hope that we could make a general design in following week. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Sep 23, 2014, at 23:39, Will Benton wrote: > Hi Yi, > > I've had some interest in implementing windowing and

Re: A couple questions about shared variables

2014-09-23 Thread Sandy Ryza
Filed https://issues.apache.org/jira/browse/SPARK-3642 for documenting these nuances. -Sandy On Mon, Sep 22, 2014 at 10:36 AM, Nan Zhu wrote: > I see, thanks for pointing this out > > > -- > Nan Zhu > > On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote: > > MapReduce counters do not

SPARK-3660 : Initial RDD for updateStateByKey transformation

2014-09-23 Thread Soumitra Kumar
Hello fellow developers, Thanks TD for relevant pointers. I have created an issue : https://issues.apache.org/jira/browse/SPARK-3660 Copying the description from JIRA: " How to initialize state tranformation updateStateByKey? I have word counts from previous spark-submit run, and want to load t

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread DB Tsai
Hi Will, We're also very interested in windowing support in SparkSQL. Let's us know once this is available for testing. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Sep 23

spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Priya Ch
Hi, I am using spark 1.0.0. In my spark code i m trying to persist an rdd to disk as rrd.persist(DISK_ONLY). But unfortunately couldn't find the location where the rdd has been written to disk. I specified SPARK_LOCAL_DIRS and SPARK_WORKER_DIR to some other location rather than using the default /

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Aaron Davidson
This may be related: https://github.com/Parquet/parquet-mr/issues/211 Perhaps if we change our configuration settings for Parquet it would get better, but the performance characteristics of Snappy are pretty bad here under some circumstances. On Tue, Sep 23, 2014 at 10:13 AM, Cody Koeninger wrot

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Cody Koeninger
Cool, that's pretty much what I was thinking as far as configuration goes. Running on Mesos. Worker nodes are amazon xlarge, so 4 core / 15g. I've tried executor memory sizes as high as 6G Default hdfs block size 64m, about 25G of total data written by a job with 128 partitions. The exception c

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Michael Armbrust
I actually submitted a patch to do this yesterday: https://github.com/apache/spark/pull/2493 Can you tell us more about your configuration. In particular how much memory/cores do the executors have and what does the schema of your data look like? On Tue, Sep 23, 2014 at 7:39 AM, Cody Koeninger

Re: RFC: Deprecating YARN-alpha API's

2014-09-23 Thread Tom Graves
Any other comments or objections on this? Thanks,Tom On Tuesday, September 9, 2014 4:39 PM, Chester Chen wrote: We were using it until recently, we are talking to our customers and see if we can get off it. Chester Alpine Data Labs On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen wrot

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Will Benton
Hi Yi, I've had some interest in implementing windowing and rollup in particular for some of my applications but haven't had them on the front of my plate yet. If you need them as well, I'm happy to start taking a look this week. best, wb - Original Message - > From: "Yi Tian" > To

Re: OutOfMemoryError on parquet SnappyDecompressor

2014-09-23 Thread Cody Koeninger
So as a related question, is there any reason the settings in SQLConf aren't read from the spark context's conf? I understand why the sql conf is mutable, but it's not particularly user friendly to have most spark configuration set via e.g. defaults.conf or --properties-file, but for spark sql to

RE: spark.local.dir and spark.worker.dir not used

2014-09-23 Thread Shao, Saisai
Hi, Spark.local.dir is the one used to write map output data and persistent RDD blocks, but the path of file has been hashed, so you cannot directly find the persistent rdd block files, but definitely it will be in this folders on your worker node. Thanks Jerry From: Priya Ch [mailto:learnin

resources allocated for an application

2014-09-23 Thread rapelly kartheek
Hi, I am trying to find out where exactly in the spark code are the resources getting allocated for a newly submitted spark application. I have a stand-alone spark cluster. Can someone please direct me to the right part of the code. regards

Re: Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Reynold Xin
On Tue, Sep 23, 2014 at 12:47 AM, Yi Tian wrote: > Hi all, > > I have some questions about the SparkSQL and Hive-on-Spark > > Will SparkSQL support all the hive feature in the future? or just making > hive as a datasource of Spark? > Most likely not *ALL* Hive features, but almost all common fea

Question about SparkSQL and Hive-on-Spark

2014-09-23 Thread Yi Tian
Hi all, I have some questions about the SparkSQL and Hive-on-Spark Will SparkSQL support all the hive feature in the future? or just making hive as a datasource of Spark? From Spark 1.1.0 , we have thrift-server support running hql on spark. Will this feature be replaced by Hive on Spark? The