Thanks Jacky. I have created a JIRA - https://issues.apache.org/jira/browse/CARBONDATA-909 for this.
Thanks, Sanoj On Tue, Apr 11, 2017 at 5:42 PM, Jacky Li <jacky.li...@qq.com> wrote: > Hi Sanoj, > > This is because in CarbonData loading flow, it needs to scan input data > twice (one for generating global dictionary, another for actual loading). > If user is using Dataframe to write to CarbonData, and if the input > dataframe compute is costly, it is better to save it as a temporary CSV > file first and load into CarbonData instead of computing the dataframe > twice. > > However there is another option that can do single pass data load, by > using .option(“single_pass”, “true”), in this case, the input dataframe > should be computed only once. But when I check the code just now, it seems > this behavior is not implemented. :( > I think you are free to create JIRA ticket if you want. > > Regards, > Jacky > > > 在 2017年4月11日,上午10:36,Sanoj MG <sanoj.george....@gmail.com> 写道: > > > > Hi All, > > > > In CarbonDataFrameWriter, there is an option to load using CSV file. > > > > if (options.tempCSV) { > > > > loadTempCSV(options) > > } else { > > loadDataFrame(options) > > } > > > > Why is this choice required? Is there any issue if we load it directly > > without using CSV? > > > > I have many dimension table with comma in string columns, and so always > use > > .option("tempCSV", "false"). In CarbonOption can we set the default value > > as "false" as below > > > > def tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean > > > > Thanks, > > Sanoj > > > > > > On Thu, Mar 30, 2017 at 12:14 PM, Sanoj MG (JIRA) <j...@apache.org> > wrote: > > > >> Sanoj MG created CARBONDATA-836: > >> ----------------------------------- > >> > >> Summary: Error in load using dataframe - columns containing > >> comma > >> Key: CARBONDATA-836 > >> URL: https://issues.apache.org/ > jira/browse/CARBONDATA-836 > >> Project: CarbonData > >> Issue Type: Bug > >> Components: spark-integration > >> Affects Versions: 1.1.0-incubating > >> Environment: HDP sandbox 2.5, Spark 1.6.2 > >> Reporter: Sanoj MG > >> Priority: Minor > >> Fix For: NONE > >> > >> > >> While trying to load data into Carabondata table using dataframe, the > >> columns containing commas are not properly loaded. > >> > >> Eg: > >> scala> df.show(false) > >> +-------+------+-----------+----------------+---------+------+ > >> |Country|Branch|Name |Address |ShortName|Status| > >> +-------+------+-----------+----------------+---------+------+ > >> |2 |1 |Main Branch|XXXX, Dubai, UAE|UHO |256 | > >> +-------+------+-----------+----------------+---------+------+ > >> > >> > >> scala> df.write.format("carbondata").option("tableName", > >> "Branch1").option("compress", "true").mode(SaveMode.Overwrite).save() > >> > >> > >> scala> cc.sql("select * from branch1").show(false) > >> > >> +-------+------+-----------+-------+---------+------+ > >> |country|branch|name |address|shortname|status| > >> +-------+------+-----------+-------+---------+------+ > >> |2 |1 |Main Branch|XXXX | Dubai |null | > >> +-------+------+-----------+-------+---------+------+ > >> > >> > >> > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.3.15#6346) > >> > >