from:"zhangxiongfei"

Re:Re: Low throughput and effect of GC in SparkSql GROUP BY

2015-05-21 Thread zhangxiongfei

Hi Pramod Is your data compressed? I encountered similar problem,however, after turned codegen on, the GC time was still very long.The size of input data for my map task is about 100M lzo file. My query is select ip, count(*) as c from stage_bitauto_adclick_d group by ip sort by c limit 100

Hive can not get the schema of an external table created by Spark SQL API createExternalTable

2015-05-07 Thread zhangxiongfei

Hi I was trying to create an external table named adclicktable by API def createExternalTable(tableName: String, path: String),then I can get the schema of this table successfully like below and this table can be queried normally.The data files are all Parquet files. sqlContext.sql(describe

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

2015-04-17 Thread zhangxiongfei

Hi, I did some tests on Parquet Files with Spark SQL DataFrame API. I generated 36 gzip compressed parquet files by Spark SQL and stored them on Tachyon,The size of each file is about 222M.Then read them with below code. val tfs

Re:Re: Spark SQL 1.3.1　saveAsParquetFile　will output tachyon file with different block size

2015-04-14 Thread zhangxiongfei

, zhangxiongfei wrote: Hi experts I run below code in Spark Shell to access parquet files in Tachyon. 1.First,created a DataFrame by loading a bunch of Parquet Files in Tachyon val ta3 =sqlContext.parquetFile(tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m); 2.Second

Re:Re: Low throughput and effect of GC in SparkSql GROUP BY

Hive can not get the schema of an external table created by Spark SQL API createExternalTable

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

Re:Re: Spark SQL 1.3.1　saveAsParquetFile　will output tachyon file with different block size

4 matches

Site Navigation

Mail list logo

Footer information

Re:Re: Low throughput and effect of GC in SparkSql GROUP BY

Hive can not get the schema of an external table created by Spark SQL API createExternalTable

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

Re:Re: Spark SQL 1.3.1 saveAsParquetFile will output tachyon file with different block size

4 matches

Mail list logo

Re:Re: Spark SQL 1.3.1　saveAsParquetFile　will output tachyon file with different block size