[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547894#comment-13547894 ] Hudson commented on HIVE-2773: -- Integrated in Hive-trunk-hadoop2 #54 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/54/]) HIVE-2773: HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output (Francis Liu via Ashutosh Chauhan) (Revision 1304167) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1304167 Files : * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Fix For: 0.9.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2773.D1815.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2773.D2007.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2773.D2415.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236772#comment-13236772 ] Hudson commented on HIVE-2773: -- Integrated in Hive-trunk-h0.21 #1328 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1328/]) HIVE-2773: HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output (Francis Liu via Ashutosh Chauhan) (Revision 1304167) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1304167 Files : * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Fix For: 0.9.0 Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236024#comment-13236024 ] Phabricator commented on HIVE-2773: --- ashutoshc has accepted the revision HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output. +1 will commit if tests pass. REVISION DETAIL https://reviews.facebook.net/D2415 BRANCH configure2 HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.D2415.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224942#comment-13224942 ] Phabricator commented on HIVE-2773: --- ashutoshc has requested changes to the revision HIVE-2773 [jira] HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output. Francis, Normally, we first apply patches to trunk and then potentially backport them to a branch. Can, you please first generate a patch for trunk? Overall, patch looks good. I left couple of comments. Also, lint is complaining for few lines being too long. Can you take care of those one too? INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java:698 Should this be NoSuchMethodError ? ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java:708 Should this be NoSuchMethodError ? REVISION DETAIL https://reviews.facebook.net/D2007 BRANCH arcpatch-D1815 HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.D2007.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214835#comment-13214835 ] Francis Liu commented on HIVE-2773: --- Since it's an interface, things will still break not unless they extended DefaultStorageHandler. In any case will ask user@hive. HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214854#comment-13214854 ] Alexander Lorenz-Alten commented on HIVE-2773: -- Since Hive need the new methods to work with HCatalog the interface should be extended. Later it can be dropped or marked as obsolete. For a flumeNG hive sink the new methods are a good improvement. HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214962#comment-13214962 ] Francis Liu commented on HIVE-2773: --- If I add new methods, classes implementing the interface will have to implement the new methods anyway, why not just take the extra step of having them do a very little bit of refactoring (see my changes to HBaseStorageHandler)? Of course if most users are extending DefaultStorageHandler then doing as you say would be a smoother way to go about it. HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214979#comment-13214979 ] Ashutosh Chauhan commented on HIVE-2773: bq. If I add new methods, classes implementing the interface will have to implement the new methods anyway, Adding or removing methods have different impacts from point of view of backward compatibility. If you add new methods in an interface then implementations using the older interface will continue to work even after they upgrade to new version of hive. That has no cost on people who have implemented the old interface. Only if they recompile it, then they have to implement new methods. On the other hand, if you remove methods, then their old implementation won't work anymore after upgrade. They have to recompile and this implies work on their part. This is the difference which gets factored in backward compatibility. So, as long as you are adding new methods in interface, I consider it to be backward compatible. But, if you are removing methods, then its not. Makes sense? HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Assignee: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2773) HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output
[ https://issues.apache.org/jira/browse/HIVE-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214018#comment-13214018 ] Ashutosh Chauhan commented on HIVE-2773: This will effectively break backwards compatibility since you are removing a method from an interface. You can get around this by keeping original method as well as introducing new ones. Then, at the time of invocation of these methods, you can check if newer methods exists, if they do then invoke them else invoke old method. Then, we can deprecate old method. But, before you go down that path and introduce code complexity I suggest you send an email on user@hive to check with people if they are ok with your current approach as is and will be able to upgrade their storage handlers, in which case we can go ahead with this current approach. HiveStorageHandler.configureTableJobProperites() should let the handler know wether it is configuration for input or output --- Key: HIVE-2773 URL: https://issues.apache.org/jira/browse/HIVE-2773 Project: Hive Issue Type: Improvement Reporter: Francis Liu Labels: hcatalog, storage_handler Attachments: HIVE-2773.D1815.1.patch, HIVE-2773.patch HiveStorageHandler.configureTableJobProperties() is called to allow the storage handler to setup any properties that the underlying inputformat/outputformat/serde may need. But the handler implementation does not know whether it is being called for configuring input or output. This makes it a problem for handlers which sets an external state. In the case of HCatalog's HBase storageHandler, whenever a write needs to be configured we create a write transaction which needs to be committed or aborted later on. In this case configuring for both input and output each time configureTableJobProperties() is called would not be desirable. This has become an issue since HCatalog is dropping storageDrivers for SerDe and StorageHandler (see HCATALOG-237). My proposal is to replace configureTableJobProperties() with two methods: configureInputJobProperties() configureOutputJobProperties() Each method will have the same signature. I cursory look at the code and I believe changes should be straighforward also given that we are not really changing anything just splitting responsibility. If the community is fine with this approach I will go ahead and create a aptch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira