Hi, all,

Trafodion can bulk load data from HDFS into Trafodion tables. Currently, it has 
some strict requirements about the source data in order to load successfully.
Typically, data source should be clean and contains relatively few 'dirty' 
data. However, there will be some special cases where source data contains some 
special value and we hope Trafodion can handle automatically:


Automatically remove '\r' when it is used as '\r\n' the DOS format line 
delimiter.

Donot raise SQL error, but convert bad data into null automatically, and still 
be able to log this into error log files when required, don't make this change 
silent, and make this action traceable.

Allow '\n' in data field even '\n' is the line terminator

Auto truncate overflowed string, log it into the error log file, in order to 
make it traceable.

When src data have above 'issues', now, we have to do a special 'data clean' 
process before load the data: convert DOS format into Unix format, find bad 
data and remove them. However, products like Hive can handle these 'bad' data 
as mentioned above. So it will be helpful, if Trafodion can introduce a special 
mode to simulate the same 'tolerance' when doing bulkload, if user can make 
sure these are desired conversion, and no need to do the extra 'data clean' 
process. Especially, data will be shared by Trafodion and other products like 
Hive.

I will file a JIRA if no objections here, and any suggestions ideas are welcome!

Thanks,
Ming

Reply via email to