[jira] [Work logged] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true
[ https://issues.apache.org/jira/browse/HIVE-23954?focusedWorklogId=504524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504524 ] ASF GitHub Bot logged work on HIVE-23954: - Author: ASF GitHub Bot Created on: 25/Oct/20 01:01 Start Date: 25/Oct/20 01:01 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1414: URL: https://github.com/apache/hive/pull/1414#issuecomment-716076818 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504524) Time Spent: 40m (was: 0.5h) > count(*) with count(distinct) gives wrong results with > hive.optimize.countdistinct=true > --- > > Key: HIVE-23954 > URL: https://issues.apache.org/jira/browse/HIVE-23954 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eugene Chung >Assignee: Eugene Chung >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23954.01.patch, HIVE-23954.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > select count(*), count(distinct mid) from db1.table1 where partitioned_column > = '...'{code} > > is not working properly when hive.optimize.countdistinct is true. By default, > it's true for all 3.x versions. > In the two plans below, the aggregations part in the Output of Group By > Operator of Map 1 are different. > > - hive.optimize.countdistinct=false > {code:java} > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2| > | File Output Operator [FS_7] | > | Group By Operator [GBY_5] (rows=1 width=24) | > | > Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT > KEY._col0:0._col0)"] | > | <-Map 1 [SIMPLE_EDGE] | > | SHUFFLE [RS_4] | > | Group By Operator [GBY_3] (rows=343640771 width=4160) | > | > Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT > mid)"],keys:mid | > | Select Operator [SEL_2] (rows=343640771 width=4160) | > | Output:["mid"] | > | TableScan [TS_0] (rows=343640771 width=4160) | > | db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] | > || > ++{code} > > - hive.optimize.countdistinct=true > {code:java} > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2| > | File Output Operator [FS_7] | > | Group By Operator [GBY_14] (rows=1 width=16) | > | > Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] | > |
[jira] [Work logged] (HIVE-23829) Compute Stats Incorrect for Binary Columns
[ https://issues.apache.org/jira/browse/HIVE-23829?focusedWorklogId=504525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504525 ] ASF GitHub Bot logged work on HIVE-23829: - Author: ASF GitHub Bot Created on: 25/Oct/20 01:02 Start Date: 25/Oct/20 01:02 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1313: URL: https://github.com/apache/hive/pull/1313#issuecomment-716076826 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504525) Time Spent: 1.5h (was: 1h 20m) > Compute Stats Incorrect for Binary Columns > -- > > Key: HIVE-23829 > URL: https://issues.apache.org/jira/browse/HIVE-23829 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > I came across an issue when working on [HIVE-22674]. > The SerDe used for processing binary data tries to auto-detect if the data is > in Base-64. It uses > {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two > issues: > # It's slow since it will check if the array is compatible,... and then > process the data (examines the array twice) > # More importantly, this method _Tests a given byte array to see if it > contains only valid characters within the Base64 alphabet. Currently the > method treats whitespace as valid._ > https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A- > The > [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q] > for this feature uses full sentences (which includes spaces) > [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt] > and therefore it thinks this data is Base-64 and returns an incorrect > estimation for size. > This should really not auto-detect Base64 data and instead it should be > enabled with a table property. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
[ https://issues.apache.org/jira/browse/HIVE-24067?focusedWorklogId=504523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504523 ] ASF GitHub Bot logged work on HIVE-24067: - Author: ASF GitHub Bot Created on: 25/Oct/20 01:01 Start Date: 25/Oct/20 01:01 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1425: URL: https://github.com/apache/hive/pull/1425#issuecomment-716076816 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504523) Time Spent: 0.5h (was: 20m) > TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop > > > Key: HIVE-24067 > URL: https://issues.apache.org/jira/browse/HIVE-24067 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24067.01.patch, HIVE-24067.02.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In TestReplicationScenariosExclusiveReplica during drop database operation > for primary db, it leads to wrong FS error as the ReplChangeManager is > associated with replica FS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?focusedWorklogId=504451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504451 ] ASF GitHub Bot logged work on HIVE-24311: - Author: ASF GitHub Bot Created on: 24/Oct/20 11:06 Start Date: 24/Oct/20 11:06 Worklog Time Spent: 10m Work Description: tkanng commented on pull request #1609: URL: https://github.com/apache/hive/pull/1609#issuecomment-715899122 Failed Unit tests are not related with this patch. Anybody help to review the code? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504451) Time Spent: 20m (was: 10m) > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. > > Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just > like `PTFRowcontainer` does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=504423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504423 ] ASF GitHub Bot logged work on HIVE-23935: - Author: ASF GitHub Bot Created on: 24/Oct/20 09:30 Start Date: 24/Oct/20 09:30 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1605: URL: https://github.com/apache/hive/pull/1605#discussion_r511362503 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java ## @@ -1124,6 +1125,40 @@ public void testEmptyTrustStoreProps() { setAndCheckSSLProperties(true, "", "", "jks"); } + /** + * Tests getPrimaryKeys() when db_name isn't specified. + */ + @Test + public void testGetPrimaryKeys() throws Exception { Review comment: You are right that its part of ObjectStore but TestPrimaryKey class was created so that we can have all sort of test for primary key in one place. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504423) Time Spent: 1h 50m (was: 1h 40m) > Fetching primaryKey through beeline fails with NPE > -- > > Key: HIVE-23935 > URL: https://issues.apache.org/jira/browse/HIVE-23935 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE > {noformat} > 0: jdbc:hive2://localhost:1> !primarykeys Persons > Error: MetaException(message:java.lang.NullPointerException) (state=,code=0) > org.apache.hive.service.cli.HiveSQLException: > MetaException(message:java.lang.NullPointerException) > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360) > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351) > at > org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89) > at org.apache.hive.beeline.Commands.metadata(Commands.java:125) > at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57) > at > org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=504422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504422 ] ASF GitHub Bot logged work on HIVE-23935: - Author: ASF GitHub Bot Created on: 24/Oct/20 09:28 Start Date: 24/Oct/20 09:28 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1605: URL: https://github.com/apache/hive/pull/1605#discussion_r511361890 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java ## @@ -10833,7 +10833,8 @@ public FileMetadataHandler getFileMetadataHandler(FileMetadataExprType type) { final String db_name_input, final String tbl_name_input) throws MetaException, NoSuchObjectException { -final String db_name = normalizeIdentifier(db_name_input); +final String db_name = Review comment: @ayushtkn 1. You haven't introduce any new variable but since you can change the name of other variable as well along with your PR for sanity 2. For catName, dbName and tableName its better to check and normalize everywhere which helps in making sure that hive is case insensitive for name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504422) Time Spent: 1h 40m (was: 1.5h) > Fetching primaryKey through beeline fails with NPE > -- > > Key: HIVE-23935 > URL: https://issues.apache.org/jira/browse/HIVE-23935 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE > {noformat} > 0: jdbc:hive2://localhost:1> !primarykeys Persons > Error: MetaException(message:java.lang.NullPointerException) (state=,code=0) > org.apache.hive.service.cli.HiveSQLException: > MetaException(message:java.lang.NullPointerException) > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360) > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351) > at > org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89) > at org.apache.hive.beeline.Commands.metadata(Commands.java:125) > at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57) > at > org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220022#comment-17220022 ] Qiang.Kang commented on HIVE-24311: --- Just added a patch for this issue. Anybody help to review the code? Thanks! > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. > > Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just > like `PTFRowcontainer` does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24311: -- Labels: pull-request-available (was: ) > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. > > Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just > like `PTFRowcontainer` does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?focusedWorklogId=504417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504417 ] ASF GitHub Bot logged work on HIVE-24311: - Author: ASF GitHub Bot Created on: 24/Oct/20 07:38 Start Date: 24/Oct/20 07:38 Worklog Time Spent: 10m Work Description: tkanng opened a new pull request #1609: URL: https://github.com/apache/hive/pull/1609 ### What changes were proposed in this pull request? Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: `addCursor`, `itrCursor`, etc, without resetting read blocks. As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory usage of a `Rowcontainer` and might be very large, depending on the data pattern. `currentReadBlock` and `currentWriteBlock` won't be the same object after rowcontainer flushed data to disk. Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will prevent OOM. Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just like `PTFRowcontainer` does. ### Why are the changes needed? Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will prevent OOM. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504417) Remaining Estimate: 0h Time Spent: 10m > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. > > Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just > like `PTFRowcontainer` does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang.Kang updated HIVE-24311: -- Description: Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: `addCursor`, `itrCursor`, etc, without resetting read blocks. As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory usage of a `Rowcontainer` and might be very large, depending on the data pattern. `currentReadBlock` and `currentWriteBlock` won't be the same object after rowcontainer flushed data to disk. Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will prevent OOM. Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just like `PTFRowcontainer` does. was: Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: `addCursor`, `itrCursor`, etc, without resetting read blocks. As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory usage of a `Rowcontainer` and might be very large, depending on the data pattern. `currentReadBlock` and `currentWriteBlock` won't be the same object after rowcontainer flushed data to disk. Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will prevent OOM. > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. > > Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just > like `PTFRowcontainer` does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang.Kang updated HIVE-24311: -- Description: Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: `addCursor`, `itrCursor`, etc, without resetting read blocks. As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory usage of a `Rowcontainer` and might be very large, depending on the data pattern. `currentReadBlock` and `currentWriteBlock` won't be the same object after rowcontainer flushed data to disk. Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will prevent OOM. > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > > Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: > `addCursor`, `itrCursor`, etc, without resetting read blocks. > As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory > usage of a `Rowcontainer` and might be very large, depending on the data > pattern. > `currentReadBlock` and `currentWriteBlock` won't be the same object after > rowcontainer flushed data to disk. > Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will > prevent OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang.Kang updated HIVE-24311: -- Summary: Rowcontainer should reset readBlocks when we clear rows to prevent OOM. (was: Rowcontainer should also reset readBlocks when it clear rows to prevent OOM.) > Rowcontainer should reset readBlocks when we clear rows to prevent OOM. > --- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24311) Rowcontainer should also reset readBlocks when it clear rows to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang.Kang updated HIVE-24311: -- Summary: Rowcontainer should also reset readBlocks when it clear rows to prevent OOM. (was: When we clear rows in a Rowcontainer, readBlocks should also be reset to prevent OOM.) > Rowcontainer should also reset readBlocks when it clear rows to prevent OOM. > > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24311) When we clear rows in a Rowcontainer, readBlocks should also be reset to prevent OOM.
[ https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang.Kang reassigned HIVE-24311: - > When we clear rows in a Rowcontainer, readBlocks should also be reset to > prevent OOM. > -- > > Key: HIVE-24311 > URL: https://issues.apache.org/jira/browse/HIVE-24311 > Project: Hive > Issue Type: Bug >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23685) Removing user's extra resources while executing File Merge Task
[ https://issues.apache.org/jira/browse/HIVE-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23685: -- Labels: pull-request-available (was: ) > Removing user's extra resources while executing File Merge Task > --- > > Key: HIVE-23685 > URL: https://issues.apache.org/jira/browse/HIVE-23685 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Query Planning >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23685.02.patch, HIVE-23685.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Hi, we find that MapReduce's file merge map containers will download user's > extra resources(such as: added jars, files, archives) before launching task. > When these resources are large or the network is busy, file merge jobs will > be timeout, causing the query be failed. As we all know, file merge task will > run correctly just with hive-exec.jar and MapReduce framework. Therefore, > there is no need to download user's resources. The patch below prevents > setting `tmpjars` for FileMerge Task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23685) Removing user's extra resources while executing File Merge Task
[ https://issues.apache.org/jira/browse/HIVE-23685?focusedWorklogId=504414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504414 ] ASF GitHub Bot logged work on HIVE-23685: - Author: ASF GitHub Bot Created on: 24/Oct/20 06:33 Start Date: 24/Oct/20 06:33 Worklog Time Spent: 10m Work Description: tkanng opened a new pull request #1608: URL: https://github.com/apache/hive/pull/1608 Hi, we find that MapReduce's file merge map containers will download user's extra resources(such as: added jars, files, archives) before launching task. When these resources are large or the network is busy, file merge jobs will be timeout, causing the query be failed. As we all know, file merge task will run correctly just with hive-exec.jar and MapReduce framework. Therefore, there is no need to download user's resources. The patch below prevents setting `tmpjars` for FileMerge Task. ### What changes were proposed in this pull request? Remove `tmpjars` for `MergeFileTask`. ### Why are the changes needed? For queries with some large jars, it's unnecessary for `MergeFileTask` to download extra resources and when the network is not good, containers might be time-out. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It has been tested in our product env. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 504414) Remaining Estimate: 0h Time Spent: 10m > Removing user's extra resources while executing File Merge Task > --- > > Key: HIVE-23685 > URL: https://issues.apache.org/jira/browse/HIVE-23685 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer, Query Planning >Affects Versions: All Versions >Reporter: Qiang.Kang >Assignee: Qiang.Kang >Priority: Critical > Attachments: HIVE-23685.02.patch, HIVE-23685.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Hi, we find that MapReduce's file merge map containers will download user's > extra resources(such as: added jars, files, archives) before launching task. > When these resources are large or the network is busy, file merge jobs will > be timeout, causing the query be failed. As we all know, file merge task will > run correctly just with hive-exec.jar and MapReduce framework. Therefore, > there is no need to download user's resources. The patch below prevents > setting `tmpjars` for FileMerge Task. -- This message was sent by Atlassian Jira (v8.3.4#803005)