Thank you for the opinion, Felix. Bests, Dongjoon.
On Sat, Dec 10, 2016 at 11:00 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > +1 I think it's useful to always have a pure SQL way and skip header for > plain text / csv that lots of companies have. > > > ------------------------------ > *From:* Dongjoon Hyun <dongj...@apache.org> > *Sent:* Friday, December 9, 2016 9:42:58 AM > *To:* Dongjin Lee; dev@spark.apache.org > *Subject:* Re: Question about SPARK-11374 (skip.header.line.count) > > Thank you for the opinion, Dongjin! > > > On Thu, Dec 8, 2016 at 21:56 Dongjin Lee <dong...@apache.org> wrote: > >> +1 For this idea. I need it also. >> >> Regards, >> Dongjin >> >> On Fri, Dec 9, 2016 at 8:59 AM, Dongjoon Hyun <dongj...@apache.org> >> wrote: >> >> Hi, All. >> >> >> >> >> >> Could you give me some opinion? >> >> >> >> >> >> There is an old SPARK issue, SPARK-11374, about removing header lines >> from text file. >> >> >> Currently, Spark supports removing CSV header lines by the following way. >> >> >> >> >> >> ``` >> >> >> scala> spark.read.option("header","true").csv("/data").show >> >> >> +---+---+ >> >> >> | c1| c2| >> >> >> +---+---+ >> >> >> | 1| a| >> >> >> | 2| b| >> >> >> +---+---+ >> >> >> ``` >> >> >> >> >> >> In SQL world, we can support that like the Hive way, >> `skip.header.line.count`. >> >> >> >> >> >> ``` >> >> >> scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT >> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data' >> TBLPROPERTIES('skip.header.line.count'='1')") >> >> >> scala> sql("SELECT * FROM t1").show >> >> >> +---+-----+ >> >> >> | id|value| >> >> >> +---+-----+ >> >> >> | 1| a| >> >> >> | 2| b| >> >> >> +---+-----+ >> >> >> ``` >> >> >> >> >> >> Although I made a PR for this based on the JIRA issue, I want to know >> this is really needed feature. >> >> >> Is it need for your use cases? Or, it's enough for you to remove them in >> a preprocessing stage. >> >> >> If this is too old and not proper in these days, I'll close the PR and >> JIRA issue as WON'T FIX. >> >> >> >> >> >> Thank you for all in advance! >> >> >> >> >> >> Bests, >> >> >> Dongjoon. >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> >> >> >> >> >> >> -- >> * Dongjin Lee * >> >> >> * Software developer in Line+. So interested in massive-scale machine >> learning. facebook: www.facebook.com/dongjin.lee.kr >> <http://www.facebook.com/dongjin.lee.kr> >> linkedin: kr.linkedin.com/in/dongjinleekr >> <http://kr.linkedin.com/in/dongjinleekr> github: >> <http://goog_969573159/>github.com/dongjinleekr >> <http://github.com/dongjinleekr> twitter: www.twitter.com/dongjinleekr >> <http://www.twitter.com/dongjinleekr> * >> >> >>