Re: spark parquet too many small files ?

kali.tumm...@gmail.com Fri, 01 Jul 2016 18:05:31 -0700

Hi Neelesh,

I told you in my emails it's not spark-Scala application , I am working on just 
spark SQL.


I am launching spark-SQL shell and running my hive code inside spark SQL she'll.

Spark SQL she'll accepts functions which relate to spark SQL doesn't accepts 
fictions like collasece which is spark Scala function.

What I am trying to do is below.

from(select * from source_table where load_date="2016-09-23")a
Insert overwrite table target_table Select * 


Thanks
Sri

Sent from my iPhone

> On 1 Jul 2016, at 17:35, nsalian [via Apache Spark User List] 
> <ml-node+s1001560n27265...@n3.nabble.com> wrote:
> 
> Hi Sri, 
> 
> Thanks for the question. 
> You can simply start by doing this in the initial stage: 
> 
> val sqlContext = new SQLContext(sc) 
> val customerList = sqlContext.read.json(args(0)).coalesce(20) //using a json 
> example here 
> 
> where the argument is the path to the file(s). This will reduce the 
> partitions. 
> You can proceed with repartitioning the data further on. The goal would be to 
> reduce the number of files in the end as you do a saveAsParquet. 
> 
> Hope that helps.
> Neelesh S. Salian 
> Cloudera
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27265.html
> To unsubscribe from spark parquet too many small files ?, click here.
> NAML




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27266.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark parquet too many small files ?

Reply via email to