[ 
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270755#comment-14270755
 ] 

Qian Xu edited comment on SQOOP-1811 at 1/9/15 8:30 AM:
--------------------------------------------------------

Regardless we use what IDF class, we should validate its syntax, except the IDF 
class handles raw string (text), right? Third case for "fast connectors" looks 
like we do not want to do a strict CSV validation. I am thinking you mean a 
{{TextIntermediateDataFormat}}. 

Here is an example: We use {{mysqldump}} to copy data to HDFS as text. Without 
current optimization, JDBC connector will use CSVIDF and HDFS connector will 
use CSVIDF. Data will be converted into short lived objects row by row. This is 
inefficient. 

I'm thinking, if we provide an argument {{--no-validate}}, both JDBC connector 
and HDFS connector will use TextIDF. So it will exchange data using {{getData}} 
and {{setData}} without doing any conversions.


was (Author: stanleyxu2005):
Regardless we use what IDF class, we should validate its syntax, except the IDF 
class handles raw string (or text), right? Third case for "fast connectors" 
looks like we do not do a strict CSV validation. I am thinking you mean a 
{{TextIntermediateDataFormat}}. 

Here is an example: We use {{mysqldump}} to copy data to HDFS as text. Without 
current optimization, JDBC connector will use CSVIDF and HDFS connector will 
use CSVIDF. Data will be converted into short lived objects row by row. This is 
inefficient. 

I'm thinking, if we provide an argument {{--no-validate}}, both JDBC connector 
and HDFS connector will use TextIDF. So it will exchange data using {{getData}} 
and {{setData}} without doing any conversions.

> Sqoop2: IDF API changes
> -----------------------
>
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>         Attachments: SQOOP-1811.patch
>
>
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is 
> obvious that we want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and 
> made final, so there is no way to override this and we can enforce all to 
> return String instead of generic T
> {code}
> // hold the string in IDF base class
>  private final String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing 
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new 
> IDF implementation T can be a custom object that could encapsulate the whole 
> row.
> Third, getData and setData can have custom implementation so they can be 
> overriden to return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
>  private String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to