Re: cache table vs. parquet table performance

2019-04-17 Thread Bin Fan
set of data. I'm processing records with nested > structure, containing subtypes and arrays. 1 record takes up several KB. > > I tried to make some improvement with cache table: > > cache table event_jan_01 as select * from events where day_registered = > 20190102; > &g

Re: cache table vs. parquet table performance

2019-01-16 Thread Jörn Franke
d I'm searching for best performing solution > to query hot set of data. I'm processing records with nested structure, > containing subtypes and arrays. 1 record takes up several KB. > > I tried to make some improvement with cache table: > cache table event_jan_01 as select * fr

Re: cache table vs. parquet table performance

2019-01-16 Thread Todd Nist
erver and I'm searching for best performing > solution to query hot set of data. I'm processing records with nested > structure, containing subtypes and arrays. 1 record takes up several KB. > > I tried to make some improvement with cache table: > > cache table event_jan_01 as selec

cache table vs. parquet table performance

2019-01-15 Thread Tomas Bartalos
Hello, I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB. I tried to make some improvement with cache table: cache table event_jan_01

Re: Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-20 Thread Yong Zhang
, and will be cached individually. Yong From: Taotao.Li <charles.up...@gmail.com> Sent: Sunday, November 20, 2016 6:18 AM To: Rabin Banerjee Cc: Yong Zhang; user; Mich Talebzadeh; Tathagata Das Subject: Re: Will spark cache table once even if I call read

Re: Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-20 Thread Taotao.Li
6 >> >> >> >> Yong >> >> ------ >> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> >> *Sent:* Friday, November 18, 2016 10:36 AM >> *To:* user; Mich Talebzadeh; Tathagata Das >> *Subject:* Will spark cache ta

Re: Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-18 Thread Rabin Banerjee
com> > *Sent:* Friday, November 18, 2016 10:36 AM > *To:* user; Mich Talebzadeh; Tathagata Das > *Subject:* Will spark cache table once even if I call read/cache on the > same table multiple times > > Hi All , > > I am working in a project where code is divided into multi

Re: Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-18 Thread Yong Zhang
:36 AM To: user; Mich Talebzadeh; Tathagata Das Subject: Will spark cache table once even if I call read/cache on the same table multiple times Hi All , I am working in a project where code is divided into multiple reusable module . I am not able to understand spark persist/cache on that c

Will spark cache table once even if I call read/cache on the same table multiple times

2016-11-18 Thread Rabin Banerjee
Hi All , I am working in a project where code is divided into multiple reusable module . I am not able to understand spark persist/cache on that context. My Question is Will spark cache table once even if I call read/cache on the same table multiple times ?? Sample Code :: TableReader

How to verify in Spark 1.6.x usage, User Memory used after Cache table

2016-07-15 Thread Yogesh Rajak
. But i did not find any way to check if Spark is using User Memory or not. Please let me know if we can verify the scenario. Thanks, Yogesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-verify-in-Spark-1-6-x-usage-User-Memory-used-after-Cache

Re: Re: About cache table performance in spark sql

2016-02-04 Thread Takeshi Yamamuro
hat the slow process mainly > caused by GC pressure and I had understand this difference > just from your advice. > > I had each executor memory with 6GB and try to cache table. > I had 3 executors and finally I can see some info from the spark job ui > storage, like the following: >

Re: Re: About cache table performance in spark sql

2016-02-04 Thread fightf...@163.com
Oh, thanks. Make sense to me. Best, Sun. fightf...@163.com From: Takeshi Yamamuro Date: 2016-02-04 16:01 To: fightf...@163.com CC: user Subject: Re: Re: About cache table performance in spark sql Hi, Parquet data are column-wise and highly compressed, so the size of deserialized rows

Re: Re: About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
Hi, Thanks a lot for your explaination. I know that the slow process mainly caused by GC pressure and I had understand this difference just from your advice. I had each executor memory with 6GB and try to cache table. I had 3 executors and finally I can see some info from the spark job ui

Re: About cache table performance in spark sql

2016-02-03 Thread Prabhu Joseph
does not have enough heap. Thanks, Prabhu Joseph On Thu, Feb 4, 2016 at 11:25 AM, fightf...@163.com <fightf...@163.com> wrote: > Hi, > > I want to make sure that the cache table indeed would accelerate sql > queries. Here is one of my use case : > impala table size : 24.5

About cache table performance in spark sql

2016-02-03 Thread fightf...@163.com
Hi, I want to make sure that the cache table indeed would accelerate sql queries. Here is one of my use case : impala table size : 24.59GB, no partitions, with about 1 billion+ rows. I use sqlContext.sql to run queries over this table and try to do cache and uncache command to see

Cache table as

2016-01-20 Thread Younes Naguib
Hi all, I'm connected to the thrift server using beeline on Spark 1.6. I used : cache table tbl as select * from table1 I see table1 in the storage memory. I can use it. But when I reconnect, I cant quert it anymore. I get : Error: org.apache.spark.sql.AnalysisException: Table not found: table1

why "cache table a as select * from b" will do shuffle,and create 2 stages.

2015-12-14 Thread ant2nebula
why "cache table a as select * from b" will do shuffle,and create 2 stages. example: table "ods_pay_consume" is from "KafkaUtils.createDirectStream" hiveContext.sql("cache table dwd_pay_consume as select * from ods_pay_consume"

why "cache table a as select * from b" will do shuffle,and create 2 stages.

2015-12-13 Thread ant2nebula
why "cache table a as select * from b" will do shuffle,and create 2 stages. example: table "ods_pay_consume" is from "KafkaUtils.createDirectStream" hiveContext.sql("cache table dwd_pay_consume as select * from ods_pay_consume"

SparkSQL cache table with multiple replicas

2015-07-03 Thread David Sabater Dinter
Hi all, Do you know if there is an option to specify how many replicas we want while caching in memory a table in SparkSQL Thrift server? I have not seen any option so far but I assumed there is an option as you can see in the Storage section of the UI that there is 1 x replica of your

how to cache table with OFF_HEAP storage level in SparkSQL thriftserver

2015-03-23 Thread LiuZeshan
hi all: I got a spark on yarn cluster (spark-1.3.0, hadoop-2.2.0) with hive-0.12.0 and tachyon-0.6.1, and now I start SparkSQL thriftserver with start-thriftserver.sh, and use beeline to connect to thriftserver according to spark document. My question is: how to cache table with specified

Re: HiveContext: cache table not supported for partitioned table?

2014-10-03 Thread Du Li
...@spark.apache.org Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: HiveContext: cache table not supported for partitioned table? Cache table works with partitioned table. I guess you’re experimenting with a default local metastore

HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Du Li
Hi, In Spark 1.1 HiveContext, I ran a create partitioned table command followed by a cache table command and got a java.sql.SQLSyntaxErrorException: Table/View 'PARTITIONS' does not exist. But cache table worked fine if the table is not a partitioned table. Can anybody confirm that cache

Re: HiveContext: cache table not supported for partitioned table?

2014-10-02 Thread Cheng Lian
Cache table works with partitioned table. I guess you’re experimenting with a default local metastore and the metastore_db directory doesn’t exist at the first place. In this case, all metastore tables/views don’t exist at first and will throw the error message you saw when the |PARTITIONS

Re: Potential Thrift Server Bug on Spark SQL,perhaps with cache table?

2014-08-25 Thread Cheng Lian
: col1 = STRING col2 = STRING col3 = STRING col4 = Partition Field (TYPE STRING) Queries cache table table1; --Run some other queries on other data select col1 from table1 where col2 = 'foo' and col3 = 'bar' and col4 = 'foobar' and col1 is not null limit 100 Fairly simple query. When I

cache table with JDBC

2014-08-22 Thread ken
I am using Spark's Thrift server to connect to Hive and use JDBC to issue queries. Is there a way to cache table in Sparck by using JDBC call? Thanks, Ken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cache-table-with-JDBC-tp12675.html Sent from

Potential Thrift Server Bug on Spark SQL,perhaps with cache table?

2014-08-20 Thread John Omernik
Built in (and Thrift Server) My query is only selecting one STRING column from the data, but only returning data based on other columns . Types: col1 = STRING col2 = STRING col3 = STRING col4 = Partition Field (TYPE STRING) Queries cache table table1; --Run some other queries on other data