[jira] [Work logged] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23954?focusedWorklogId=504524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504524
 ]

ASF GitHub Bot logged work on HIVE-23954:
-

Author: ASF GitHub Bot
Created on: 25/Oct/20 01:01
Start Date: 25/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1414:
URL: https://github.com/apache/hive/pull/1414#issuecomment-716076818


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504524)
Time Spent: 40m  (was: 0.5h)

> count(*) with count(distinct) gives wrong results with 
> hive.optimize.countdistinct=true
> ---
>
> Key: HIVE-23954
> URL: https://issues.apache.org/jira/browse/HIVE-23954
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eugene Chung
>Assignee: Eugene Chung
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23954.01.patch, HIVE-23954.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> select count(*), count(distinct mid) from db1.table1 where partitioned_column 
> = '...'{code}
>  
> is not working properly when hive.optimize.countdistinct is true. By default, 
> it's true for all 3.x versions.
> In the two plans below, the aggregations part in the Output of Group By 
> Operator of Map 1 are different.
>  
> - hive.optimize.countdistinct=false
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_5] (rows=1 width=24) |
> |   
> Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
> KEY._col0:0._col0)"] |
> | <-Map 1 [SIMPLE_EDGE]  |
> |   SHUFFLE [RS_4]   |
> | Group By Operator [GBY_3] (rows=343640771 width=4160) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
> mid)"],keys:mid |
> |   Select Operator [SEL_2] (rows=343640771 width=4160) |
> | Output:["mid"] |
> | TableScan [TS_0] (rows=343640771 width=4160) |
> |   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
> ||
> ++{code}
>  
> - hive.optimize.countdistinct=true
> {code:java}
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2|
> |   File Output Operator [FS_7]  |
> | Group By Operator [GBY_14] (rows=1 width=16) |
> |   
> Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
> |   

[jira] [Work logged] (HIVE-23829) Compute Stats Incorrect for Binary Columns

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23829?focusedWorklogId=504525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504525
 ]

ASF GitHub Bot logged work on HIVE-23829:
-

Author: ASF GitHub Bot
Created on: 25/Oct/20 01:02
Start Date: 25/Oct/20 01:02
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-716076826


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504525)
Time Spent: 1.5h  (was: 1h 20m)

> Compute Stats Incorrect for Binary Columns
> --
>
> Key: HIVE-23829
> URL: https://issues.apache.org/jira/browse/HIVE-23829
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I came across an issue when working on [HIVE-22674].
> The SerDe used for processing binary data tries to auto-detect if the data is 
> in Base-64.  It uses 
> {{org.apache.commons.codec.binary.Base64#isArrayByteBase64}} which has two 
> issues:
> # It's slow since it will check if the array is compatible,... and then 
> process the data (examines the array twice)
> # More importantly, this method _Tests a given byte array to see if it 
> contains only valid characters within the Base64 alphabet. Currently the 
> method treats whitespace as valid._
> https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isArrayByteBase64-byte:A-
> The 
> [qtest|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/ql/src/test/queries/clientpositive/compute_stats_binary.q]
>  for this feature uses full sentences (which includes spaces) 
> [here|https://github.com/apache/hive/blob/f98e136bdd5642e3de10d2fd1a4c14d1d6762113/data/files/binary.txt]
>  and therefore it thinks this data is Base-64 and returns an incorrect 
> estimation for size.
> This should really not auto-detect Base64 data and instead it should be 
> enabled with a table property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24067) TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24067?focusedWorklogId=504523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504523
 ]

ASF GitHub Bot logged work on HIVE-24067:
-

Author: ASF GitHub Bot
Created on: 25/Oct/20 01:01
Start Date: 25/Oct/20 01:01
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1425:
URL: https://github.com/apache/hive/pull/1425#issuecomment-716076816


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504523)
Time Spent: 0.5h  (was: 20m)

> TestReplicationScenariosExclusiveReplica - Wrong FS error during DB drop
> 
>
> Key: HIVE-24067
> URL: https://issues.apache.org/jira/browse/HIVE-24067
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24067.01.patch, HIVE-24067.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In TestReplicationScenariosExclusiveReplica during drop database operation 
> for primary db, it leads to wrong FS error as the ReplChangeManager is 
> associated with replica FS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?focusedWorklogId=504451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504451
 ]

ASF GitHub Bot logged work on HIVE-24311:
-

Author: ASF GitHub Bot
Created on: 24/Oct/20 11:06
Start Date: 24/Oct/20 11:06
Worklog Time Spent: 10m 
  Work Description: tkanng commented on pull request #1609:
URL: https://github.com/apache/hive/pull/1609#issuecomment-715899122


   Failed Unit tests are not related with this patch. Anybody help to review 
the code?  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504451)
Time Spent: 20m  (was: 10m)

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.
>  
> Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
> like `PTFRowcontainer` does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=504423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504423
 ]

ASF GitHub Bot logged work on HIVE-23935:
-

Author: ASF GitHub Bot
Created on: 24/Oct/20 09:30
Start Date: 24/Oct/20 09:30
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1605:
URL: https://github.com/apache/hive/pull/1605#discussion_r511362503



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java
##
@@ -1124,6 +1125,40 @@ public void testEmptyTrustStoreProps() {
 setAndCheckSSLProperties(true, "", "", "jks");
   }
 
+  /**
+   * Tests getPrimaryKeys() when db_name isn't specified.
+   */
+  @Test
+  public void testGetPrimaryKeys() throws Exception {

Review comment:
   You are right that its part of ObjectStore but TestPrimaryKey class was 
created so that we can have all sort of test for primary key in one place.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504423)
Time Spent: 1h 50m  (was: 1h 40m)

> Fetching primaryKey through beeline fails with NPE
> --
>
> Key: HIVE-23935
> URL: https://issues.apache.org/jira/browse/HIVE-23935
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE
> {noformat}
> 0: jdbc:hive2://localhost:1> !primarykeys Persons
> Error: MetaException(message:java.lang.NullPointerException) (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: 
> MetaException(message:java.lang.NullPointerException)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351)
>   at 
> org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89)
>   at org.apache.hive.beeline.Commands.metadata(Commands.java:125)
>   at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57)
>   at 
> org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23935) Fetching primaryKey through beeline fails with NPE

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23935?focusedWorklogId=504422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504422
 ]

ASF GitHub Bot logged work on HIVE-23935:
-

Author: ASF GitHub Bot
Created on: 24/Oct/20 09:28
Start Date: 24/Oct/20 09:28
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1605:
URL: https://github.com/apache/hive/pull/1605#discussion_r511361890



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -10833,7 +10833,8 @@ public FileMetadataHandler 
getFileMetadataHandler(FileMetadataExprType type) {
  final String 
db_name_input,
  final String 
tbl_name_input)
   throws MetaException, NoSuchObjectException {
-final String db_name = normalizeIdentifier(db_name_input);
+final String db_name =

Review comment:
   @ayushtkn 
   1. You haven't introduce any new variable but since you can change the name 
of other variable as well along with your PR for sanity
   2. For catName, dbName and tableName its better to check and normalize 
everywhere which helps in making sure that hive is case insensitive for name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504422)
Time Spent: 1h 40m  (was: 1.5h)

> Fetching primaryKey through beeline fails with NPE
> --
>
> Key: HIVE-23935
> URL: https://issues.apache.org/jira/browse/HIVE-23935
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Fetching PrimaryKey of a table through Beeline !primarykey fails with NPE
> {noformat}
> 0: jdbc:hive2://localhost:1> !primarykeys Persons
> Error: MetaException(message:java.lang.NullPointerException) (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: 
> MetaException(message:java.lang.NullPointerException)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:360)
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:351)
>   at 
> org.apache.hive.jdbc.HiveDatabaseMetaData.getPrimaryKeys(HiveDatabaseMetaData.java:573)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hive.beeline.Reflector.invoke(Reflector.java:89)
>   at org.apache.hive.beeline.Commands.metadata(Commands.java:125)
>   at org.apache.hive.beeline.Commands.primarykeys(Commands.java:231)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57)
>   at 
> org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1465)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1504)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1364)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1134)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220022#comment-17220022
 ] 

Qiang.Kang commented on HIVE-24311:
---

Just added a patch for this issue. Anybody help to review the code?  Thanks!
 

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.
>  
> Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
> like `PTFRowcontainer` does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24311:
--
Labels: pull-request-available  (was: )

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.
>  
> Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
> like `PTFRowcontainer` does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?focusedWorklogId=504417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504417
 ]

ASF GitHub Bot logged work on HIVE-24311:
-

Author: ASF GitHub Bot
Created on: 24/Oct/20 07:38
Start Date: 24/Oct/20 07:38
Worklog Time Spent: 10m 
  Work Description: tkanng opened a new pull request #1609:
URL: https://github.com/apache/hive/pull/1609


   
   
   
   ### What changes were proposed in this pull request?
   
   Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
`addCursor`, `itrCursor`, etc, without resetting read blocks.
   
   As we all know, `currentReadBlock` and `currentWriteBlock` is the main 
memory usage of a `Rowcontainer` and might be very large, depending on the data 
pattern.
   
   `currentReadBlock` and `currentWriteBlock` won't be the same object after 
rowcontainer flushed data to disk.
   
   Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows 
will prevent OOM.
   

   
   Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
like `PTFRowcontainer` does.
   
   
   ### Why are the changes needed?
   
   
   Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows 
will prevent OOM.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   
   ### How was this patch tested?
   
   
   Unit tests .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504417)
Remaining Estimate: 0h
Time Spent: 10m

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.
>  
> Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
> like `PTFRowcontainer` does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang.Kang updated HIVE-24311:
--
Description: 
Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
`addCursor`, `itrCursor`, etc, without resetting read blocks.

As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
usage of a `Rowcontainer` and might be very large, depending on the data 
pattern.

`currentReadBlock` and `currentWriteBlock` won't be the same object after 
rowcontainer flushed data to disk.

Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
prevent OOM.

 

Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just like 
`PTFRowcontainer` does.

  was:
Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
`addCursor`, `itrCursor`, etc, without resetting read blocks.

As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
usage of a `Rowcontainer` and might be very large, depending on the data 
pattern.

`currentReadBlock` and `currentWriteBlock` won't be the same object after 
rowcontainer flushed data to disk.

Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
prevent OOM.


> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.
>  
> Therefore, I submit a patch to reset read blocks for `Rowcontainer`, just 
> like `PTFRowcontainer` does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang.Kang updated HIVE-24311:
--
Description: 
Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
`addCursor`, `itrCursor`, etc, without resetting read blocks.

As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
usage of a `Rowcontainer` and might be very large, depending on the data 
pattern.

`currentReadBlock` and `currentWriteBlock` won't be the same object after 
rowcontainer flushed data to disk.

Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
prevent OOM.

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>
> Hi, We found that `Rowcontainer.clearRows()` only reset some index, such as: 
> `addCursor`, `itrCursor`, etc, without resetting read blocks.
> As we all know, `currentReadBlock` and `currentWriteBlock` is the main memory 
> usage of a `Rowcontainer` and might be very large, depending on the data 
> pattern.
> `currentReadBlock` and `currentWriteBlock` won't be the same object after 
> rowcontainer flushed data to disk.
> Resetting `currentReadBlock` and `currentWriteBlock` while clearing rows will 
> prevent OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24311) Rowcontainer should reset readBlocks when we clear rows to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang.Kang updated HIVE-24311:
--
Summary: Rowcontainer should reset readBlocks when we clear rows to prevent 
OOM.  (was: Rowcontainer should also reset readBlocks when it clear rows to 
prevent OOM.)

> Rowcontainer should reset readBlocks when we clear rows to prevent OOM.
> ---
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24311) Rowcontainer should also reset readBlocks when it clear rows to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang.Kang updated HIVE-24311:
--
Summary: Rowcontainer should also reset readBlocks when it clear rows to 
prevent OOM.  (was: When we clear rows in a Rowcontainer,  readBlocks should 
also be reset to prevent OOM.)

> Rowcontainer should also reset readBlocks when it clear rows to prevent OOM.
> 
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24311) When we clear rows in a Rowcontainer, readBlocks should also be reset to prevent OOM.

2020-10-24 Thread Qiang.Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang.Kang reassigned HIVE-24311:
-


> When we clear rows in a Rowcontainer,  readBlocks should also be reset to 
> prevent OOM.
> --
>
> Key: HIVE-24311
> URL: https://issues.apache.org/jira/browse/HIVE-24311
> Project: Hive
>  Issue Type: Bug
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23685) Removing user's extra resources while executing File Merge Task

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23685:
--
Labels: pull-request-available  (was: )

> Removing user's extra resources while executing File Merge Task
> ---
>
> Key: HIVE-23685
> URL: https://issues.apache.org/jira/browse/HIVE-23685
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Query Planning
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23685.02.patch, HIVE-23685.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, we find that MapReduce's file merge map containers will download user's 
> extra resources(such as: added jars, files, archives) before launching task. 
> When these resources are large or the network is busy, file merge jobs will 
> be timeout, causing the query be failed. As we all know, file merge task will 
> run correctly just with hive-exec.jar and MapReduce framework. Therefore, 
> there is no need to download user's resources. The patch below prevents 
> setting `tmpjars` for FileMerge Task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23685) Removing user's extra resources while executing File Merge Task

2020-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23685?focusedWorklogId=504414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-504414
 ]

ASF GitHub Bot logged work on HIVE-23685:
-

Author: ASF GitHub Bot
Created on: 24/Oct/20 06:33
Start Date: 24/Oct/20 06:33
Worklog Time Spent: 10m 
  Work Description: tkanng opened a new pull request #1608:
URL: https://github.com/apache/hive/pull/1608


   Hi, we find that MapReduce's file merge map containers will download user's 
extra resources(such as: added jars, files, archives) before launching task. 
When these resources are large or the network is busy, file merge jobs will be 
timeout, causing the query be failed. As we all know, file merge task will run 
correctly just with hive-exec.jar and MapReduce framework. Therefore, there is 
no need to download user's resources. The patch below prevents setting 
`tmpjars` for FileMerge Task.
   
   
   
   
   ### What changes were proposed in this pull request?
   
   
   Remove `tmpjars` for `MergeFileTask`.
   
   
   
   
   ### Why are the changes needed?
   
   
   For queries with some large jars, it's unnecessary for `MergeFileTask` to 
download extra resources and when the network is not good, containers might be 
time-out.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   
   It has been tested in our product env.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 504414)
Remaining Estimate: 0h
Time Spent: 10m

> Removing user's extra resources while executing File Merge Task
> ---
>
> Key: HIVE-23685
> URL: https://issues.apache.org/jira/browse/HIVE-23685
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Query Planning
>Affects Versions: All Versions
>Reporter: Qiang.Kang
>Assignee: Qiang.Kang
>Priority: Critical
> Attachments: HIVE-23685.02.patch, HIVE-23685.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi, we find that MapReduce's file merge map containers will download user's 
> extra resources(such as: added jars, files, archives) before launching task. 
> When these resources are large or the network is busy, file merge jobs will 
> be timeout, causing the query be failed. As we all know, file merge task will 
> run correctly just with hive-exec.jar and MapReduce framework. Therefore, 
> there is no need to download user's resources. The patch below prevents 
> setting `tmpjars` for FileMerge Task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)