I'm using 1.0.4 Thanks, -- Pei-Lun
On Fri, Mar 27, 2015 at 2:32 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hm, which version of Hadoop are you using? Actually there should also be > a _metadata file together with _common_metadata. I was using Hadoop 2.4.1 > btw. I'm not sure whether Hadoop version matters here, but I did observe > cases where Spark behaves differently because of semantic differences of > the same API in different Hadoop versions. > > Cheng > > On 3/27/15 11:33 AM, Pei-Lun Lee wrote: > > Hi Cheng, > > on my computer, execute res0.save("xxx", org.apache.spark.sql.SaveMode. > Overwrite) produces: > > peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx > total 32 > -rwxrwxrwx 1 peilunlee staff 0 Mar 27 11:29 _SUCCESS* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00001.parquet* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00002.parquet* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00003.parquet* > -rwxrwxrwx 1 peilunlee staff 488 Mar 27 11:29 part-r-00004.parquet* > > while res0.save("xxx") produces: > > peilunlee@pllee-mini:~/opt/spark-1.3...rc3-bin-hadoop1$ ls -l xxx > total 40 > -rwxrwxrwx 1 peilunlee staff 0 Mar 27 11:29 _SUCCESS* > -rwxrwxrwx 1 peilunlee staff 250 Mar 27 11:29 _common_metadata* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00001.parquet* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00002.parquet* > -rwxrwxrwx 1 peilunlee staff 272 Mar 27 11:29 part-r-00003.parquet* > -rwxrwxrwx 1 peilunlee staff 488 Mar 27 11:29 part-r-00004.parquet* > > On Thu, Mar 26, 2015 at 7:26 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > >> I couldn’t reproduce this with the following spark-shell snippet: >> >> scala> import sqlContext.implicits._ >> scala> Seq((1, 2)).toDF("a", "b") >> scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite) >> scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite) >> >> The _common_metadata file is typically much smaller than _metadata, >> because it doesn’t contain row group information, and thus can be faster to >> read than _metadata. >> >> Cheng >> >> On 3/26/15 12:48 PM, Pei-Lun Lee wrote: >> >> Hi, >> >> When I save parquet file with SaveMode.Overwrite, it never generate >> _common_metadata. Whether it overwrites an existing dir or not. >> Is this expected behavior? >> And what is the benefit of _common_metadata? Will reading performs better >> when it is present? >> >> Thanks, >> -- >> Pei-Lun >> >> >> > > >