[ 
https://issues.apache.org/jira/browse/SQOOP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13580839#comment-13580839
 ] 

Hari Shreedharan commented on SQOOP-777:
----------------------------------------

Hi Jarcec,

I am sorry, I was not a part of the project at this time, so I don't have much 
background on the discussion at the time. But I definitely do not agree that 
text is a good intermediate format.  

I am not sure why we should be comparing against mysqldump or pg_dump, and if 
their performance is due to their format. Since we are primarily interested in 
reading directly from the db (rather than the dumps), I don't really understand 
why text would perform better than a binary format like Avro? 

Also by using text, it becomes complex to encode field names and schemas (other 
than by forcing a JSON like schema or having header like structures).

I might be wrong on multiple fronts here, but text is inherently expensive 
anyway - so I don't see much benefit in that either.
                
> Sqoop2: Implement intermediate data format representation policy 
> -----------------------------------------------------------------
>
>                 Key: SQOOP-777
>                 URL: https://issues.apache.org/jira/browse/SQOOP-777
>             Project: Sqoop
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Hari Shreedharan
>             Fix For: 2.0.0
>
>
> We should enforce our intermediate data format policy to enforce as currently 
> each driver can do it differently and that might break things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to