RE: kryo

2016-05-12 Thread Younes Naguib
) at org.joda.time.DateTimeZone.convertUTCToLocal(DateTimeZone.java:925) Any ideas? Thanks From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: May-11-16 5:32 PM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: kryo Have you seen this thread ? http://search-hadoop.com/m/q3RTtpO0qI3cp06/JodaDateTimeSerializer+spark

kryo

2016-05-11 Thread Younes Naguib
it, and register it in the spark-default conf. Thanks, Younes Naguib <mailto:younes.nag...@streamtheworld.com>

Spark/Parquet

2016-04-14 Thread Younes Naguib
Hi all, When parquet 2.0 planned in Spark? Or is it already? Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.nag...@tritondigital.com <mailto:younes.nag...@streamtheworld.com>

RE: Subquery performance

2016-03-19 Thread Younes Naguib
Anyways to cache the subquery or force a broadcast join without persisting it? y From: Michael Armbrust [mailto:mich...@databricks.com] Sent: March-17-16 8:59 PM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: Subquery performance Try running EXPLAIN on both version of the query

Subquery performance

2016-03-19 Thread Younes Naguib
Hi all, I'm running a query that looks like the following: Select col1, count(1) >From (Select col2, count(1) from tab2 group by col2) Inner join tab1 on (col1=col2) Group by col1 This creates a very large shuffle, 10 times the data size, as if the subquery was executed for each row. Anything

Low latency queries much slower in 1.6.0

2016-02-03 Thread Younes Naguib
Hi all, Since 1.6.0, low latency query are much slower now. This seems to be connected to the multi-user in the thrift-server. So on any newly created session, jobs are added to fill the session cache with information related to the tables it queries. Here is the details for this job: load at

ctas fails with "No plan for CreateTableAsSelect"

2016-01-26 Thread Younes Naguib
Hi, I'm running CTAS, and it fails with "Error: java.lang.AssertionError: assertion failed: No plan for CreateTableAsSelect HiveTable" Here what my sql looks like : Create tbl ( Col1 timestamp , Col2 string, Col3 int, . ) partitioned by (year int, month int, day

RE: ctas fails with "No plan for CreateTableAsSelect"

2016-01-26 Thread Younes Naguib
SQL on beeline and connecting to the thriftserver. Younes From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: January-26-16 11:05 AM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: ctas fails with "No plan for CreateTableAsSelect" Were you using HiveContext or SQLContext ? Ca

RE: ctas fails with "No plan for CreateTableAsSelect"

2016-01-26 Thread Younes Naguib
11:11 AM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: ctas fails with "No plan for CreateTableAsSelect" Maybe try enabling the following (false by default): "spark.sql.hive.convertCTAS" doc = "When true, a table created by a Hive CTAS s

RE: ctas fails with "No plan for CreateTableAsSelect"

2016-01-26 Thread Younes Naguib
Patil [mailto:tejas.patil...@gmail.com] Sent: January-26-16 11:39 AM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: ctas fails with "No plan for CreateTableAsSelect" In CTAS, you should not specify the column information as it is derived from the result of SELECT statement.

RE: ctas fails with "No plan for CreateTableAsSelect"

2016-01-26 Thread Younes Naguib
: Younes Naguib [mailto:younes.nag...@tritondigital.com] Sent: January-26-16 11:42 AM To: 'Tejas Patil' Cc: user@spark.apache.org Subject: RE: ctas fails with "No plan for CreateTableAsSelect" The destination table is partitioned. If I don’t specify the columns I get :

Cache table as

2016-01-20 Thread Younes Naguib
Hi all, I'm connected to the thrift server using beeline on Spark 1.6. I used : cache table tbl as select * from table1 I see table1 in the storage memory. I can use it. But when I reconnect, I cant quert it anymore. I get : Error: org.apache.spark.sql.AnalysisException: Table not found:

Securing objects on the thrift server

2015-12-15 Thread Younes Naguib
Hi all, I get this error when running "show current roles;" : 2015-12-15 15:50:41 WARN org.apache.hive.service.cli.thrift.ThriftCLIService ThriftCLIService:681 - Error fetching results: org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with operation handle:

RE: Securing objects on the thrift server

2015-12-15 Thread Younes Naguib
The one coming with spark 1.5.2. y From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: December-15-15 1:59 PM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: Securing objects on the thrift server Which Hive release are you using ? Please take a look at HIVE-8529 Cheers On Tue, Dec 15

metastore_db

2015-11-12 Thread Younes Naguib
Hi all, Is there any documentation on how to setup metastore_db on MySQL in Spark? I did find a load of information, but they all seems to be some "hack" for spark. Thanks Younes

Spark reading from S3 getting very slow

2015-11-04 Thread Younes Naguib
new () }.toDF() myDF.registerTempTable("tbl") sqlContext.sql("select count(1) from tbl").collect() Any help/idea? Thanks, Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1

heap memory

2015-10-30 Thread Younes Naguib
Hi all, I'm running a spark shell: bin/spark-shell --executor-memory 32G --driver-memory 8G I keep getting : 15/10/30 13:41:59 WARN MemoryManager: Total allocation exceeds 95.00% (2,147,483,647 bytes) of heap memory Any help ? Thanks, Younes Naguib Triton Digital | 1440 Ste

Broadcast table

2015-10-26 Thread Younes Naguib
Hi all, I use the thrift server, and I cache a table using "cache table mytab". Is there any sql to broadcast it too? Thanks Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 |

Succinct experience

2015-10-19 Thread Younes Naguib
Hi all, Anyone has any experience with SuccinctRDD? Thanks, Younes

Dynamic partition pruning

2015-10-16 Thread Younes Naguib
Hi all I'm running sqls on spark 1.5.1 and using tables based on parquets. My tables are not pruned when joined on partition columns. Ex: Select from tab where partcol=1 will prune on value 1 Select from tab join dim on (dim.partcol=tab.partcol) where dim.partcol=1 will scan all

RE: Dynamic partition pruning

2015-10-16 Thread Younes Naguib
Thanks, Do you have a Jira I can follow for this? y From: Michael Armbrust [mailto:mich...@databricks.com] Sent: October-16-15 2:18 PM To: Younes Naguib Cc: user@spark.apache.org Subject: Re: Dynamic partition pruning We don't support dynamic partition pruning yet. On Fri, Oct 16, 2015 at 10

Dynamic partitioning pruning

2015-10-14 Thread Younes Naguib
Hi, This feature was added in Hive 1.3. https://issues.apache.org/jira/browse/HIVE-9152 Any idea when this would be in Spark? Or is it already? Any work around in spark 1.5.1? Thanks, Younes

JDBC thrift server

2015-10-08 Thread Younes Naguib
Hi, We've been using the JDBC thrift server for a couple of weeks now and running queries on it like a regular RDBMS. We're about to deploy it in a shared production cluster. Any advice, warning on a such setup. Yarn or Mesos? How about dynamic resource allocation in a already running thrift

RE: Parquet file size

2015-10-07 Thread Younes Naguib
The TSV original files is 600GB and generated 40k files of 15-25MB. y From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: October-07-15 3:18 PM To: Younes Naguib; 'user@spark.apache.org' Subject: Re: Parquet file size Why do you want larger files? Doesn't the result Parquet file contain all

RE: Parquet file size

2015-10-07 Thread Younes Naguib
Well, I only have data for 2015-08. So, in the end, only 31 partitions What I'm looking for, is some reasonably sized partitions. In any case, just the idea of controlling the output parquet files size or number would be nice. Younes Naguib Streaming Division Triton Digital | 1440 Ste

Parquet file size

2015-10-07 Thread Younes Naguib
, Younes Naguib Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC H3G 1R8 Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.nag...@tritondigital.com <mailto:younes.nag...@streamtheworld.com>

Spark context on thrift server

2015-10-05 Thread Younes Naguib
Hi, We're using a spark thrift server and we connect using jdbc to run queries. Every time we run a set query, like "set schema", it seems to affect the server, and not the session only. Is that an expected behavior? Or am I missing something. Younes Naguib Triton Digital | 1440 Ste