[
https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270755#comment-14270755
]
Qian Xu edited comment on SQOOP-1811 at 1/9/15 8:32 AM:
--------------------------------------------------------
Regardless what IDF class we use, we should validate its syntax, except the IDF
class handles raw string (text), right? Third case for "fast connectors" looks
like we do not want to do a strict CSV validation. I am thinking you mean a
{{TextIntermediateDataFormat}}.
Here is an example: We use {{mysqldump}} to copy data to HDFS as text. Without
current optimization, JDBC connector will use CSVIDF and HDFS connector will
use CSVIDF. Data will be converted into short lived objects row by row. This is
inefficient.
I'm thinking, if we provide an argument {{--no-validate}}, both JDBC connector
and HDFS connector will use TextIDF. So it will exchange data using {{getData}}
and {{setData}} without doing any conversions.
was (Author: stanleyxu2005):
Regardless we use what IDF class, we should validate its syntax, except the IDF
class handles raw string (text), right? Third case for "fast connectors" looks
like we do not want to do a strict CSV validation. I am thinking you mean a
{{TextIntermediateDataFormat}}.
Here is an example: We use {{mysqldump}} to copy data to HDFS as text. Without
current optimization, JDBC connector will use CSVIDF and HDFS connector will
use CSVIDF. Data will be converted into short lived objects row by row. This is
inefficient.
I'm thinking, if we provide an argument {{--no-validate}}, both JDBC connector
and HDFS connector will use TextIDF. So it will exchange data using {{getData}}
and {{setData}} without doing any conversions.
> Sqoop2: IDF API changes
> -----------------------
>
> Key: SQOOP-1811
> URL: https://issues.apache.org/jira/browse/SQOOP-1811
> Project: Sqoop
> Issue Type: Sub-task
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
> Attachments: SQOOP-1811.patch
>
>
> 1. update the java docs for IDF apis.
> 2. Make the getTextData final and call it getCSV and setCSV, so it is
> obvious that we want to enforce CSV format
> the following code can move to the base class IntermediateDataFormat and
> made final, so there is no way to override this and we can enforce all to
> return String instead of generic T
> {code}
> // hold the string in IDF base class
> private final String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing
> that can be pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new
> IDF implementation T can be a custom object that could encapsulate the whole
> row.
> Third, getData and setData can have custom implementation so they can be
> overriden to return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
> private String text.
>
> public final String getCSVTextData() {
> return text;
> }
>
> public final void setCSVTextData(String text) {
> this.text = text;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)