Hi all,

  I met an data lost problem when loading data from csv file to carbon table, 
here are some details:


  Env: Spark 2.1.0 + Hadoop 2.7.2 + CarbonData 1.0.0
  Total Records:719,384
  Loaded Records:606,305 (SQL: select count(1) from table)


  My Attemps:


    Attemp1: Add option bad_records_action='force' when loading data. It also 
doesn't work, it's count equals to 606,305;
    Attemp2: Cut line 1 to 300,000 into a csv file and load, the result is 
right, which equals to 300,000;
    Attemp3: Cut line 1 to 350,000 into a csv file and load, the result is 
wrong, it equals to 305,631;
    Attemp4: Cut line 300,000 to 350,000 into a csv file and load, the result 
is right, it equals to 50,000;
    Attemp5: Count the separator '|' of my csv file, it equals to lines * 
columns,  so the source data may in the correct format; 


    In spark log, each attemp logs out : "Bad Record Found".


    Anyone have any ideas?

Reply via email to