Re: Spark cluster multi tenancy

2015-08-26 Thread Jerrick Hoang
Would be interested to know the answer too. On Wed, Aug 26, 2015 at 11:45 AM, Sadhan Sood sadhan.s...@gmail.com wrote: Interestingly, if there is nothing running on dev spark-shell, it recovers successfully and regains the lost executors. Attaching the log for that. Notice, the Registering

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-24 Thread Jerrick Hoang
[mailto:mich...@databricks.com] *Sent:* Monday, August 24, 2015 2:13 PM *To:* Philip Weaver philip.wea...@gmail.com *Cc:* Jerrick Hoang jerrickho...@gmail.com; Raghavendra Pandey raghavendra.pan...@gmail.com; User user@spark.apache.org; Cheng, Hao hao.ch...@intel.com *Subject:* Re: Spark Sql behaves

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-23 Thread Jerrick Hoang
anybody has any suggestions? On Fri, Aug 21, 2015 at 3:14 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Is there a workaround without updating Hadoop? Would really appreciate if someone can explain what spark is trying to do here and what is an easy way to turn this off. Thanks all

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-21 Thread Jerrick Hoang
query. *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com] *Sent:* Thursday, August 20, 2015 1:46 PM *To:* Cheng, Hao *Cc:* Philip Weaver; user *Subject:* Re: Spark Sql behaves strangely with tables with a lot of partitions I cloned from TOT after 1.5.0 cut off. I noticed there were

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-21 Thread Jerrick Hoang
version 2.7.1 .. It is known that s3a works really well with parquet which is available in 2.7. They fixed lot of issues related to metadata reading there... On Aug 21, 2015 11:24 PM, Jerrick Hoang jerrickho...@gmail.com wrote: @Cheng, Hao : Physical plans show that it got stuck on scanning S3

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
, you can try set the spark.sql.sources.partitionDiscovery.enabled to false. BTW, which version are you using? Hao *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com] *Sent:* Thursday, August 20, 2015 12:16 PM *To:* Philip Weaver *Cc:* user *Subject:* Re: Spark Sql behaves

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
at 7:51 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I did a simple experiment with Spark SQL. I created a partitioned parquet table with only one partition (date=20140701). A simple `select count(*) from table where date=20140701` would run very fast (0.1 seconds). However, as I

Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
Hi all, I did a simple experiment with Spark SQL. I created a partitioned parquet table with only one partition (date=20140701). A simple `select count(*) from table where date=20140701` would run very fast (0.1 seconds). However, as I added more partitions the query takes longer and longer. When

Re: Spark failed while trying to read parquet files

2015-08-07 Thread Jerrick Hoang
fixed in (the real) Parquet 1.7.0 https://issues.apache.org/jira/browse/PARQUET-136 Cheng On 8/8/15 6:20 AM, Jerrick Hoang wrote: Hi all, I have a partitioned parquet table (very small table with only 2 partitions). The version of spark is 1.4.1, parquet version is 1.7.0. I applied

Spark failed while trying to read parquet files

2015-08-07 Thread Jerrick Hoang
Hi all, I have a partitioned parquet table (very small table with only 2 partitions). The version of spark is 1.4.1, parquet version is 1.7.0. I applied this patch to spark [SPARK-7743] so I assume that spark can read parquet files normally, however, I'm getting this when trying to do a simple

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Jerrick Hoang
how big is the dataset? how complicated is the query? On Sun, Jul 26, 2015 at 12:47 AM Louis Hust louis.h...@gmail.com wrote: Hi, all, I am using spark DataFrame to fetch small table from MySQL, and i found it cost so much than directly access MySQL Using JDBC. Time cost for Spark is about

Re: Spark-hive parquet schema evolution

2015-07-21 Thread Jerrick Hoang
take schema evolution into account. Could you please give a concrete use case? Are you trying to write Parquet data with extra columns into an existing metastore Parquet table? Cheng On 7/21/15 1:04 AM, Jerrick Hoang wrote: I'm new to Spark, any ideas would be much appreciated! Thanks

Re: Spark-hive parquet schema evolution

2015-07-20 Thread Jerrick Hoang
I'm new to Spark, any ideas would be much appreciated! Thanks On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm aware of the support for schema evolution via DataFrame API. Just wondering what would be the best way to go about dealing with schema

Spark-hive parquet schema evolution

2015-07-18 Thread Jerrick Hoang
Hi all, I'm aware of the support for schema evolution via DataFrame API. Just wondering what would be the best way to go about dealing with schema evolution with Hive metastore tables. So, say I create a table via SparkSQL CLI, how would I deal with Parquet schema evolution? Thanks, J

Re: Getting not implemented by the TFS FileSystem implementation

2015-07-16 Thread Jerrick Hoang
So, this has to do with the fact that 1.4 has a new way to interact with HiveMetastore, still investigating. Would really appreciate if anybody has any insights :) On Tue, Jul 14, 2015 at 4:28 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm upgrading from spark1.3 to spark1.4

Getting not implemented by the TFS FileSystem implementation

2015-07-14 Thread Jerrick Hoang
Hi all, I'm upgrading from spark1.3 to spark1.4 and when trying to run spark-sql CLI. It gave an ```ava.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation``` exception. I did not get this error with 1.3 and I don't use any TFS FileSystem. Full stack trace is

Re: Basic Spark SQL question

2015-07-13 Thread Jerrick Hoang
Well for adhoc queries you can use the CLI On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez zlgonza...@yahoo.com.invalid wrote: Hi, I have a question for Spark SQL. Is there a way to be able to use Spark SQL on YARN without having to submit a job? Bottom line here is I want to be able to

hive-site.xml spark1.3

2015-07-13 Thread Jerrick Hoang
Hi all, I'm having conf/hive-site.xml pointing to my Hive metastore but sparksql CLI doesn't pick it up. (copying the same conf/ files to spark1.4 and 1.2 works fine). Just wondering if someone has seen this before, Thanks

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead

SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would