I'm using 1.0.4

Thanks,
--
Pei-Lun

On Fri, Mar 27, 2015 at 2:32 PM, Cheng Lian <lian.cs....@gmail.com> wrote:

>  Hm, which version of Hadoop are you using? Actually there should also be
> a _metadata file together with _common_metadata. I was using Hadoop 2.4.1
> btw. I'm not sure whether Hadoop version matters here, but I did observe
> cases where Spark behaves differently because of semantic differences of
> the same API in different Hadoop versions.
>
> Cheng
>
> On 3/27/15 11:33 AM, Pei-Lun Lee wrote:
>
> Hi Cheng,
>
>  on my computer, execute res0.save("xxx", org.apache.spark.sql.SaveMode.
> Overwrite) produces:
>
>  peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx
> total 32
> -rwxrwxrwx  1 peilunlee  staff    0 Mar 27 11:29 _SUCCESS*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00001.parquet*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00002.parquet*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00003.parquet*
> -rwxrwxrwx  1 peilunlee  staff  488 Mar 27 11:29 part-r-00004.parquet*
>
>  while res0.save("xxx") produces:
>
>  peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx
> total 40
> -rwxrwxrwx  1 peilunlee  staff    0 Mar 27 11:29 _SUCCESS*
> -rwxrwxrwx  1 peilunlee  staff  250 Mar 27 11:29 _common_metadata*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00001.parquet*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00002.parquet*
> -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00003.parquet*
> -rwxrwxrwx  1 peilunlee  staff  488 Mar 27 11:29 part-r-00004.parquet*
>
> On Thu, Mar 26, 2015 at 7:26 PM, Cheng Lian <lian.cs....@gmail.com> wrote:
>
>>  I couldn’t reproduce this with the following spark-shell snippet:
>>
>> scala> import sqlContext.implicits._
>> scala> Seq((1, 2)).toDF("a", "b")
>> scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite)
>> scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite)
>>
>> The _common_metadata file is typically much smaller than _metadata,
>> because it doesn’t contain row group information, and thus can be faster to
>> read than _metadata.
>>
>> Cheng
>>
>> On 3/26/15 12:48 PM, Pei-Lun Lee wrote:
>>
>> Hi,
>>
>>  When I save parquet file with SaveMode.Overwrite, it never generate
>> _common_metadata. Whether it overwrites an existing dir or not.
>> Is this expected behavior?
>> And what is the benefit of _common_metadata? Will reading performs better
>> when it is present?
>>
>>  Thanks,
>> --
>> Pei-Lun
>>
>>  ​
>>
>
>
>

Reply via email to