[ https://issues.apache.org/jira/browse/SQOOP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13580839#comment-13580839 ]
Hari Shreedharan commented on SQOOP-777: ---------------------------------------- Hi Jarcec, I am sorry, I was not a part of the project at this time, so I don't have much background on the discussion at the time. But I definitely do not agree that text is a good intermediate format. I am not sure why we should be comparing against mysqldump or pg_dump, and if their performance is due to their format. Since we are primarily interested in reading directly from the db (rather than the dumps), I don't really understand why text would perform better than a binary format like Avro? Also by using text, it becomes complex to encode field names and schemas (other than by forcing a JSON like schema or having header like structures). I might be wrong on multiple fronts here, but text is inherently expensive anyway - so I don't see much benefit in that either. > Sqoop2: Implement intermediate data format representation policy > ----------------------------------------------------------------- > > Key: SQOOP-777 > URL: https://issues.apache.org/jira/browse/SQOOP-777 > Project: Sqoop > Issue Type: New Feature > Affects Versions: 2.0.0 > Reporter: Jarek Jarcec Cecho > Assignee: Hari Shreedharan > Fix For: 2.0.0 > > > We should enforce our intermediate data format policy to enforce as currently > each driver can do it differently and that might break things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira