subject:"\[jira\] \[Work logged\] \(HIVE\-25130\) alter table concat gives NullPointerException, when data is inserted from Spark"

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-07-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=620663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620663
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 18:53
Start Date: 08/Jul/21 18:53
Worklog Time Spent: 10m 
  Work Description: kishendas closed pull request #2285:
URL: https://github.com/apache/hive/pull/2285


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 620663)
Time Spent: 1h 10m  (was: 1h)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-06-02 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=605420&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605420
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 02/Jun/21 18:03
Start Date: 02/Jun/21 18:03
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #2285:
URL: https://github.com/apache/hive/pull/2285#issuecomment-853267248


   @harishjp Can you please take one more look at this ? I checked the failed 
test and it looks unrelated. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 605420)
Time Spent: 1h  (was: 50m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=599660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599660
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 20/May/21 08:00
Start Date: 20/May/21 08:00
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2285:
URL: https://github.com/apache/hive/pull/2285#discussion_r635866315



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String 
filename) {
* @param filename
*  filename to extract taskid from
*/
-  private static String getPrefixedTaskIdFromFilename(String filename) {
+  static String getPrefixedTaskIdFromFilename(String filename) {
 return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX);
   }
 
   private static String getTaskIdFromFilename(String filename, Pattern 
pattern) {
-return getIdFromFilename(filename, pattern, 1);
+return getIdFromFilename(filename, pattern, 1, false);
   }
 
-  private static int getAttemptIdFromFilename(String filename) {
-String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3);
+  static int getAttemptIdFromFilename(String filename) {
+String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true);
 return Integer.parseInt(attemptStr.substring(1));
   }
 
-  private static String getIdFromFilename(String filename, Pattern pattern, 
int group) {
+  private static String getIdFromFilename(String filename, Pattern pattern, 
int group, boolean extractAttemptId) {
 String taskId = filename;
 int dirEnd = filename.lastIndexOf(Path.SEPARATOR);
-if (dirEnd != -1) {
+if (dirEnd!=-1) {
   taskId = filename.substring(dirEnd + 1);
 }
 
-Matcher m = pattern.matcher(taskId);
-if (!m.matches()) {
-  LOG.warn("Unable to get task id from file name: {}. Using last component 
{}"
-  + " as task id.", filename, taskId);
+// Spark emitted files have the format 
part-[number-string]-uuid..
+// Examples: 
part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 
is the taskId
+// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 
4-c6acfdee is the taskId
+String strings[] = taskId.split("-");

Review comment:
   @harishjp Please review now. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599660)
Time Spent: 50m  (was: 40m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concaten

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=599051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-599051
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 19/May/21 07:43
Start Date: 19/May/21 07:43
Worklog Time Spent: 10m 
  Work Description: kishendas commented on a change in pull request #2285:
URL: https://github.com/apache/hive/pull/2285#discussion_r634990814



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String 
filename) {
* @param filename
*  filename to extract taskid from
*/
-  private static String getPrefixedTaskIdFromFilename(String filename) {
+  static String getPrefixedTaskIdFromFilename(String filename) {
 return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX);
   }
 
   private static String getTaskIdFromFilename(String filename, Pattern 
pattern) {
-return getIdFromFilename(filename, pattern, 1);
+return getIdFromFilename(filename, pattern, 1, false);
   }
 
-  private static int getAttemptIdFromFilename(String filename) {
-String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3);
+  static int getAttemptIdFromFilename(String filename) {
+String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true);
 return Integer.parseInt(attemptStr.substring(1));
   }
 
-  private static String getIdFromFilename(String filename, Pattern pattern, 
int group) {
+  private static String getIdFromFilename(String filename, Pattern pattern, 
int group, boolean extractAttemptId) {
 String taskId = filename;
 int dirEnd = filename.lastIndexOf(Path.SEPARATOR);
-if (dirEnd != -1) {
+if (dirEnd!=-1) {
   taskId = filename.substring(dirEnd + 1);
 }
 
-Matcher m = pattern.matcher(taskId);
-if (!m.matches()) {
-  LOG.warn("Unable to get task id from file name: {}. Using last component 
{}"
-  + " as task id.", filename, taskId);
+// Spark emitted files have the format 
part-[number-string]-uuid..
+// Examples: 
part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 
is the taskId
+// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 
4-c6acfdee is the taskId
+String strings[] = taskId.split("-");

Review comment:
   Agreed. I am not sure if the file format has changed in recent times. 
Let me talk to the Spark team and get their insights as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 599051)
Time Spent: 40m  (was: 0.5h)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeT

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=598986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598986
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 19/May/21 03:09
Start Date: 19/May/21 03:09
Worklog Time Spent: 10m 
  Work Description: harishjp commented on a change in pull request #2285:
URL: https://github.com/apache/hive/pull/2285#discussion_r634882155



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
##
@@ -1252,32 +1253,45 @@ public static String getTaskIdFromFilename(String 
filename) {
* @param filename
*  filename to extract taskid from
*/
-  private static String getPrefixedTaskIdFromFilename(String filename) {
+  static String getPrefixedTaskIdFromFilename(String filename) {
 return getTaskIdFromFilename(filename, FILE_NAME_PREFIXED_TASK_ID_REGEX);
   }
 
   private static String getTaskIdFromFilename(String filename, Pattern 
pattern) {
-return getIdFromFilename(filename, pattern, 1);
+return getIdFromFilename(filename, pattern, 1, false);
   }
 
-  private static int getAttemptIdFromFilename(String filename) {
-String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3);
+  static int getAttemptIdFromFilename(String filename) {
+String attemptStr = getIdFromFilename(filename, 
FILE_NAME_PREFIXED_TASK_ID_REGEX, 3, true);
 return Integer.parseInt(attemptStr.substring(1));
   }
 
-  private static String getIdFromFilename(String filename, Pattern pattern, 
int group) {
+  private static String getIdFromFilename(String filename, Pattern pattern, 
int group, boolean extractAttemptId) {
 String taskId = filename;
 int dirEnd = filename.lastIndexOf(Path.SEPARATOR);
-if (dirEnd != -1) {
+if (dirEnd!=-1) {
   taskId = filename.substring(dirEnd + 1);
 }
 
-Matcher m = pattern.matcher(taskId);
-if (!m.matches()) {
-  LOG.warn("Unable to get task id from file name: {}. Using last component 
{}"
-  + " as task id.", filename, taskId);
+// Spark emitted files have the format 
part-[number-string]-uuid..
+// Examples: 
part-00026-23003837-facb-49ec-b1c4-eeda902cacf3.c000.zlib.orc, 00026-23003837 
is the taskId
+// and part-4-c6acfdee-0c32-492e-b209-c2f1cf40.c000, 
4-c6acfdee is the taskId
+String strings[] = taskId.split("-");

Review comment:
   This looks a bit fragile and can match lot of files. We should use 
slightly better pattern matching here to ensure we do not accidentally match 
other files. Something of the lines of "part-(\d+-[a-f0-9]+)- "...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598986)
Time Spent: 0.5h  (was: 20m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoo

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=598389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598389
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 18/May/21 01:34
Start Date: 18/May/21 01:34
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #2285:
URL: https://github.com/apache/hive/pull/2285#issuecomment-842756108


   @harishjp Please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598389)
Time Spent: 20m  (was: 10m)

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:489)
> at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

2021-05-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25130?focusedWorklogId=598388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598388
 ]

ASF GitHub Bot logged work on HIVE-25130:
-

Author: ASF GitHub Bot
Created on: 18/May/21 01:33
Start Date: 18/May/21 01:33
Worklog Time Spent: 10m 
  Work Description: kishendas opened a new pull request #2285:
URL: https://github.com/apache/hive/pull/2285


   
   ### What changes were proposed in this pull request?
   Fix NullPointerException during alter table concat, after data is inserted 
from Spark. 
   
   
   ### Why are the changes needed?
   To fix NPE.
   
   2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
[HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
'java.lang.NullPointerException(null)'
   java.lang.NullPointerException
   at 
org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
   at 
org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
   at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added a unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 598388)
Remaining Estimate: 0h
Time Spent: 10m

> alter table concat gives NullPointerException, when data is inserted from 
> Spark
> ---
>
> Key: HIVE-25130
> URL: https://issues.apache.org/jira/browse/HIVE-25130
> Project: Hive
>  Issue Type: Bug
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the complete stack trace of the NullPointerException
> 2021-03-01 14:50:32,201 ERROR org.apache.hadoop.hive.ql.exec.Task: 
> [HiveServer2-Background-Pool: Thread-76760]: Job Commit failed with exception 
> 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getAttemptIdFromFilename(Utilities.java:1333)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.compareTempOrDuplicateFiles(Utilities.java:1966)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.ponderRemovingTempOrDuplicateFile(Utilities.java:1907)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFilesNonMm(Utilities.java:1892)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1797)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.removeTempOrDuplicateFiles(Utilities.java:1674)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1544)
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.jobCloseOp(AbstractFileMergeOperator.java:304)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:637)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.executeTask(AlterTableConcatenateOperation.java:129)
> at 
> org.apache.hadoop.hive.ql.ddl.table.storage.concatenate.AlterTableConcatenateOperation.execute(AlterTableConcatenateOperation.java:63)
> at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:740)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:495)
> at org.apache.hadoo

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

[jira] [Work logged] (HIVE-25130) alter table concat gives NullPointerException, when data is inserted from Spark

7 matches

Site Navigation

Mail list logo

Footer information