[jira] [Updated] (HIVE-18429) Compaction should handle a case when it produces no output

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18429:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Prasanth for the review

> Compaction should handle a case when it produces no output
> --
>
> Key: HIVE-18429
> URL: https://issues.apache.org/jira/browse/HIVE-18429
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18429.01.patch, HIVE-18429.02.patch, 
> HIVE-18429.03.patch
>
>
> Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs.
> It will currently produce an MR job with 0 splits and so 
> {{CompactorMR.TMP_LOCATION}} never gets created.  This causes 
> {{CompactorOutputCommitted.commitJob()}} to fail when it tries to do 
>  {{FileStatus[] contents = fs.listStatus(tmpLocation);}} since tmpLocation 
> doesn't exist.
> If compactor fails to produce delta_8_9 here it will fail to do further 
> compaction unless new delta with data is created.  
> If the number of empty deltas is > than 
> HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to 
> proceed at all.
> It should produce a delta_8_9 in this case even if it's empty.
> The error (in the log of standalone metastore process) would look like this
> {noformat}
> 2017-12-27 17:19:28,850 ERROR CommitterEvent Processor #1 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not 
> commit job
> java.io.FileNotFoundException: File 
> hdfs://OTCHaaS/apps/hive/warehouse/momi.db/sensor_data/babyid=5911806ebf6964014257/_tmp_b4c5a3f3-44e5-4d45-86af-5b773bf0fc96
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
> at 
> rg.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:785)
> at 
> org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
> at  
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18419) CliDriver loads different hive-site.xml into HiveConf and MetastoreConf

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18419:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
 Release Note: n/a
   Status: Resolved  (was: Patch Available)

committed to master
 thanks Alan for the review

> CliDriver loads different hive-site.xml into HiveConf and MetastoreConf
> ---
>
> Key: HIVE-18419
> URL: https://issues.apache.org/jira/browse/HIVE-18419
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Test
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18419.01.patch, HIVE-18419.02.patch
>
>
> Various forms of CliDriver use CliConfigs to set the 'confDir' below  (e.g. 
> CliConfigs.SparkOnYarnCliConfig() used by TestMiniSparkOnYarnCliDriver.
> QTestUtil.QTestUtil() has
> {noformat}
> if (confDir != null && !confDir.isEmpty()) {
>   HiveConf.setHiveSiteLocation(new URL("file://"+ new 
> File(confDir).toURI().getPath() + "/hive-site.xml"));
>   MetastoreConf.setHiveSiteLocation(HiveConf.getHiveSiteLocation());
>   System.out.println("Setting hive-site: 
> "+HiveConf.getHiveSiteLocation());
> }
> {noformat}
> This causes HiveConf.initialize() to load hvie-site.xml from that location.
> MetastoreConf only loads hive-site.xml from the classpath which in the test 
> environment picks up data/conf/hive-site.xml
> So different parts of the system may end up disagreeing about property values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13563) Hive Streaming does not honor orc.compress.size and orc.stripe.size table properties

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13563:
--
Component/s: Transactions

> Hive Streaming does not honor orc.compress.size and orc.stripe.size table 
> properties
> 
>
> Key: HIVE-13563
> URL: https://issues.apache.org/jira/browse/HIVE-13563
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Major
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13563.1.patch, HIVE-13563.2.patch, 
> HIVE-13563.3.patch, HIVE-13563.4.patch, HIVE-13563.branch-1.patch
>
>
> According to the doc:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax
> One should be able to specify tblproperties for many ORC options.
> But the settings for orc.compress.size and orc.stripe.size don't take effect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18460:
-


> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Affects Version/s: (was: 0.14.0)
   0.13

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327885#comment-16327885
 ] 

Eugene Koifman commented on HIVE-18460:
---

because of this any properties overridden as part of Alter Table requesting 
compaction (HIVE-13354) won't be honored either

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Attachment: HIVE-18460.01.patch

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Status: Patch Available  (was: Open)

[~prasanth_j] could you review please

FYI, [~jcamachorodriguez]

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328044#comment-16328044
 ] 

Eugene Koifman commented on HIVE-18460:
---

"size4" is not a typo - 4 is the length of the value of the property.  This is 
how table properties Properties are encoded in the Configuration.

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328076#comment-16328076
 ] 

Eugene Koifman commented on HIVE-18460:
---

patch 2 has the additional test

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Attachment: HIVE-18460.02.patch

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328109#comment-16328109
 ] 

Eugene Koifman commented on HIVE-18460:
---

Compactor uses 

StringableMap to pass a HashMap as a string.  I don't remember the details the 
mechanics.

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Attachment: HIVE-18460.03.patch

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Status: Open  (was: Patch Available)

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Attachment: HIVE-18460.04.patch

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch, HIVE-18460.04.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Status: Patch Available  (was: Open)

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch, HIVE-18460.04.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-18 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330801#comment-16330801
 ] 

Eugene Koifman commented on HIVE-18460:
---

no related failures

committed to master

thanks Prasanth for the review

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch, HIVE-18460.04.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18460) Compactor doesn't pass Table properties to the Orc writer

2018-01-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18460:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Compactor doesn't pass Table properties to the Orc writer
> -
>
> Key: HIVE-18460
> URL: https://issues.apache.org/jira/browse/HIVE-18460
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.13
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18460.01.patch, HIVE-18460.02.patch, 
> HIVE-18460.03.patch, HIVE-18460.04.patch
>
>
>  
>  CompactorMap.getWrite()/getDeleteEventWriter() both do 
> AcidOutputFormat.Options.tableProperties() but
> OrcOutputFormat.getRawRecordWriter() does
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getConfiguration());
> {noformat}
> which ignores tableProperties value.
> It should do 
> {noformat}
> final OrcFile.WriterOptions opts =
> OrcFile.writerOptions(options.getTableProperties(), 
> options.getConfiguration());
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-01-18 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18192:
--
Description: 
To support ACID replication, we will be introducing a per table write Id which 
will replace the transaction id in the primary key for each row in a ACID table.

The current primary key is determined via 
 

which will move to 
 

For each table modified by the given transaction will have a table level write 
ID allocated and a persisted map of global txn id -> to table -> write id for 
that table has to be maintained to allow Snapshot isolation.

Readers should use the combination of ValidTxnList and ValidWriteIdList(Table) 
for snapshot isolation.

 

 [Hive Replication - ACID 
Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
 has a section "Per Table Sequences (Write-Id)" with more detials

  was:
To support ACID replication, we will be introducing a per table write Id which 
will replace the transaction id in the primary key for each row in a ACID table.

The current primary key is determined via 
 

which will move to 
 

For each table modified by the given transaction will have a table level write 
ID allocated and a persisted map of global txn id -> to table -> write id for 
that table has to be maintained to allow Snapshot isolation.

Readers should use the combination of ValidTxnList and ValidWriteIdList(Table) 
for snapshot isolation.


> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18502) base_-9223372036854775808 is confusing

2018-01-19 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18502:
--
Component/s: Transactions

> base_-9223372036854775808 is confusing
> --
>
> Key: HIVE-18502
> URL: https://issues.apache.org/jira/browse/HIVE-18502
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 0.13.0
>Reporter: Eugene Koifman
>Priority: Minor
>
> start with no acid table
> write some data
> Alter Table to acid table
> run major compaction
> It will create base_-9223372036854775808  (if there are no delta dirs)
> Nothing wrong with it but confuses users.  Should probably make it base_0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18503) MM/ACID tables: make tests that will never be compatible with acid use non-txn tables explicitly

2018-01-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333133#comment-16333133
 ] 

Eugene Koifman commented on HIVE-18503:
---

linking HIVE-18315 for completeness - it did a bunch of similar work

> MM/ACID tables: make tests that will never be compatible with acid use 
> non-txn tables explicitly
> 
>
> Key: HIVE-18503
> URL: https://issues.apache.org/jira/browse/HIVE-18503
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18503.WIP.patch
>
>
> Some tests do stuff that will simply never work with ACID tables, e.g. delete 
> table files.
> They should be changed to use external table, or explicitly set 
> transactional=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18504) Hive is throwing InvalidObjectException(message:Invalid column type name is too long.

2018-01-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334702#comment-16334702
 ] 

Eugene Koifman commented on HIVE-18504:
---

2.6.3 looks like HDP version

> Hive is throwing InvalidObjectException(message:Invalid column type name is 
> too long.
> -
>
> Key: HIVE-18504
> URL: https://issues.apache.org/jira/browse/HIVE-18504
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Jimson K James
>Assignee: Naveen Gangam
>Priority: Major
> Fix For: 2.3.0, 3.0.0
>
> Attachments: hive2.log, tweets.sql
>
>
> Hive 2.6.3 is still throwing InvalidObjectException(message:Invalid column 
> type name is too long.
> Please find attached the create table query. For more details please refer to 
> HIVE-15249
> {code:java}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> InvalidObjectException(message:Invalid column type name length 2980 exceeds 
> max allowed length 2000, type 
> struct,entities:struct,text:string>>,symbols:array...
> {code}
>  
> {code:java}
> [root@sandbox-hdp hive-json]# hive --version
> Hive 1.2.1000.2.6.3.0-235
> Subversion 
> git://ctr-e134-1499953498516-254436-01-04.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos6/SOURCES/hive
>  -r 5f360bda08bb5489fbb3189b5aeaaf58029ed4b5
> Compiled by jenkins on Mon Oct 30 02:48:31 UTC 2017
> From source with checksum 94298cc1f5f5bf0f3470f3ea2e92d646
> [root@sandbox-hdp hive-json]# beeline
> Beeline version 1.2.1000.2.6.3.0-235 by Apache Hive
> beeline> !connect 
> jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> Connecting to 
> jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> Enter username for 
> jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2:
>  hive
> Enter password for 
> jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2:
>  
> Connected to: Apache Hive (version 1.2.1000.2.6.3.0-235)
> Driver: Hive JDBC (version 1.2.1000.2.6.3.0-235)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://sandbox-hdp.hortonworks.com:2>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15658) hive.ql.session.SessionState start() is not atomic, SessionState thread local variable can get into inconsistent state

2018-01-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15658:
--
Component/s: Transactions

> hive.ql.session.SessionState start() is not atomic, SessionState thread local 
> variable can get into inconsistent state
> --
>
> Key: HIVE-15658
> URL: https://issues.apache.org/jira/browse/HIVE-15658
> Project: Hive
>  Issue Type: Bug
>  Components: API, HCatalog, Transactions
>Affects Versions: 1.1.0, 1.2.1, 2.0.0, 2.0.1
> Environment: CDH5.8.0, Flume 1.6.0, Hive 1.1.0
>Reporter: Michal Klempa
>Priority: Major
> Attachments: HIVE-15658_branch-1.2_1.patch, 
> HIVE-15658_branch-2.1_1.patch
>
>
> Method start() in hive.ql.session.SessionState is supposed to setup needed 
> preconditions, like HDFS scratch directories for session.
> This happens to be not an atomic operation with setting thread local 
> variable, which can later be obtained by calling SessionState.get().
> Therefore, even is the start() method itself fails, the SessionState.get() 
> does not return null and further re-use of the thread which previously 
> invoked start() may lead to obtaining SessionState object in inconsistent 
> state.
> I have observed this using Flume Hive Sink, which uses Hive Streaming 
> interface. When the directory /tmp/hive is not writable by session user, the 
> start() method fails (throwing RuntimeException). If the thread is re-used 
> (like it is in Flume), further executions work with wrongly initialized 
> SessionState object (HDFS dirs are non-existent). In Flume, this happens to 
> me when Flume should create partition if not exists (but the code doing this 
> is in Hive Streaming).
> Steps to reproduce:
> 0. create test spooldir and allow flume to write to it, in my case 
> /home/ubuntu/flume_test, 775, ubuntu:flume
> 1. create Flume config (see attachment)
> 2. create Hive table
> {code}
> create table default.flume_test (column1 string, column2 string) partitioned 
> by (dt string) clustered by (column1) INTO 2 BUCKETS STORED AS ORC;
> {code}
> 3. start flume agent:
> {code}
> bin/flume-ng agent -n a1 -c conf -f conf/flume-config.txt
> {code}
> 4. hdfs dfs -chmod 600 /tmp/hive
> 5. put this file into spooldir:
> {code}
> echo value1,value2 > file1
> {code}
> Expected behavior:
> Exception regarding scratch dir permissions to be thrown repeatedly.
> example (note that the line numbers are wrong as Cloudera is cloning the 
> source codes here https://github.com/cloudera/flume-ng/ and here 
> https://github.com/cloudera/hive):
> {code}
> 2017-01-18 12:39:38,926 WARN org.apache.flume.sink.hive.HiveSink: sink_hive_1 
> : Failed connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to 
> EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', database='default', 
> table='flume_test', partitionVals=[20170118] } 
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:99)
> at 
> org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
> at 
> org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:296)
> at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:254)
> at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed 
> connecting to EndPoint {metaStoreUri='thrift://n02.cdh.ideata:9083', 
> database='default', table='flume_test', partitionVals=[20170118] }
> at 
> org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:380)
> at org.apache.flume.sink.hive.HiveWriter.(HiveWriter.java:86)
> ... 6 more
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root 
> scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: 
> rw---
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:540)
> at 
> org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.createPartitionIfNotExists(HiveEndPoint.java:358)
> at 
> org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.(HiveEndPoint.java:276)
> at 
> org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.(HiveEndPoint.java:243)
> at 
> org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180)
> at 
> org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection

[jira] [Assigned] (HIVE-18520) add current txnid to ValidTxnList

2018-01-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18520:
-


> add current txnid to ValidTxnList
> -
>
> Key: HIVE-18520
> URL: https://issues.apache.org/jira/browse/HIVE-18520
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> add the Id of the transaction that obtained this ValidTxnList
> if nothing else, convenient for debugging
> in particular include it in ErrorMsg.ACID_NOT_ENOUGH_HISTORY



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18519) do not create materialized CTEs with ACID/MM

2018-01-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18519:
--
Component/s: Transactions

> do not create materialized CTEs with ACID/MM
> 
>
> Key: HIVE-18519
> URL: https://issues.apache.org/jira/browse/HIVE-18519
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18519.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18519) do not create materialized CTEs with ACID/MM

2018-01-23 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336844#comment-16336844
 ] 

Eugene Koifman commented on HIVE-18519:
---

+1 pending tests

> do not create materialized CTEs with ACID/MM
> 
>
> Key: HIVE-18519
> URL: https://issues.apache.org/jira/browse/HIVE-18519
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18519.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-01-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338208#comment-16338208
 ] 

Eugene Koifman commented on HIVE-18192:
---

Suppose A is the set of all ValidTxnList across all active readers.  Each 
ValidTxnList has minOpenTxnId.

MIN_HISTORY_LEVEL allows us to determine min(minOpenTxnId) across all currently 
active readers.  It's not the same as COMPLETED_TXN_COMPONENTS.

Entries from COMPLETED_TXN_COMPONENTS get removed once compaction has processed 
the relevant partition which can happen before all readers that think txn X is 
open have gone away.

 

Suppose txn 17 starts at t1 and sees txnid 13 with writeID 13 open.

13 commits (via it's parent txn) at t2 > t1.  (17 is still running).

Compaction runs at t3 >t2 to produce base_14 (or delta_10_14 for example) on 
Table1/Part1 (17 is still running)

COMPLETED_TXN_COMPONENTS may be cleaned at this point.

at t4 > t3 17 may (multi stmt txn) needs to read Table1/Part1.  It now needs to 
construct a ValidWriteIDList as it would have looked like at the time 17 
started to maintain SI.

But if TXNS2WRITE_ID no longer has txnid13 -> writeID13 mapping, there is no 
way to know that ValidWriteIDList should have writeID13 Open.   

 

 

 

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18157) Vectorization : Insert in bucketed table is broken with vectorization

2018-01-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18157:
--
Component/s: Vectorization
 Transactions

> Vectorization : Insert in bucketed table is broken with vectorization
> -
>
> Key: HIVE-18157
> URL: https://issues.apache.org/jira/browse/HIVE-18157
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions, Vectorization
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18157.1.patch, HIVE-18157.3.patch
>
>
> create temporary table foo (x int) clustered by (x) into 4 buckets;
> insert overwrite table foo values(1),(2),(3),(4),(9);
> select *, regexp_extract(INPUT__FILE__NAME, '.*/(.*)', 1) from foo;
> OK
> 9   00_0
> 4   00_0
> 3   00_0
> 2   00_0
> 1   00_0
> set hive.vectorized.execution.enabled=false;
> insert overwrite table foo values(1),(2),(3),(4),(9);
> select *, regexp_extract(INPUT__FILE__NAME, '.*/(.*)', 1) from foo;
> OK
> 4   00_0
> 9   01_0
> 1   01_0
> 2   02_0
> 3   03_0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17457) IOW Acid Insert Overwrite when the transaction fails

2018-01-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17457:
--
Summary: IOW Acid Insert Overwrite when the transaction fails  (was: Acid 
Insert Overwrite when the transaction fails)

> IOW Acid Insert Overwrite when the transaction fails
> 
>
> Key: HIVE-17457
> URL: https://issues.apache.org/jira/browse/HIVE-17457
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> HIVE-14988 adds support for Insert Overwrite for Acid tables.
> once we have direct write to target dir (i.e. no move op) - how do we handle 
> the case where the txn running IOW aborts?  See if getAcidState() does the 
> right thing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18154) IOW Acid Load Data/Insert with Overwrite in multi statement transactions

2018-01-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18154:
--
Summary: IOW Acid Load Data/Insert with Overwrite in multi statement 
transactions  (was: Acid Load Data/Insert with Overwrite in multi statement 
transactions)

> IOW Acid Load Data/Insert with Overwrite in multi statement transactions
> 
>
> Key: HIVE-18154
> URL: https://issues.apache.org/jira/browse/HIVE-18154
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Consider:
> {noformat}
> START TRANSACTION
> insert into T values(1,2),(3,4)
> load data local inpath '" + getWarehouseDir() + "/1/data' overwrite into 
> table T
> update T set a = 0 where a = 6
> COMMIT
> {noformat}
> So what we should have on disk is
> {noformat}
> ├── base_028
> │   ├── 00_0
> │   └── _metadata_acid
> ├── delete_delta_028_028_0002
> │   └── bucket_0
> ├── delta_028_028_
> │   └── bucket_0
> └── delta_028_028_0002
> └── bucket_0
> {noformat}
> where base_28 is from overwrite, delta_028_028_ from 1st insert 
> nad delta_028_028_0002/delete_delta_028_028_0002 is from 
> update.
> AcidUtils.getAcidState() only returns base_28 thinking that all other deltas 
> are included in it - not what we want here.  
> Same applies for Insert Overwrite.
> The simple way to get correct behavior is to disallow commands with Overwrite 
> clause in multi-statement txns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18126) IOW Mechanics of multiple commands with OVERWRITE in a singe transactions

2018-01-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18126:
--
Summary: IOW Mechanics of multiple commands with OVERWRITE in a singe 
transactions  (was: Mechanics of multiple commands with OVERWRITE in a singe 
transactions)

> IOW Mechanics of multiple commands with OVERWRITE in a singe transactions
> -
>
> Key: HIVE-18126
> URL: https://issues.apache.org/jira/browse/HIVE-18126
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> for Insert overwrite/load data overwrite we create base_x/ to hold the data 
> thus are able to make Overwrite command non-blocking.  
> What happens if multiple IOWs are run against the same table/partition in the 
> same transaction.
> should base support a suffix base_x_000 like deltas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18536) IOW + DP is broken for insert-only ACID

2018-01-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343649#comment-16343649
 ] 

Eugene Koifman commented on HIVE-18536:
---

I left 1 comment on RB.

The code to create/parse base/delta dir names has spread to different places 
from being localized to AcidUtils.  This will come back to haunt us but not 
introduced in this patch.

there is a bunch of new checkstyle warning

otherwise it looks ok.

 

> IOW + DP is broken for insert-only ACID
> ---
>
> Key: HIVE-18536
> URL: https://issues.apache.org/jira/browse/HIVE-18536
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18536.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-01-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344129#comment-16344129
 ] 

Eugene Koifman commented on HIVE-18575:
---

 

hive.acid.table.scan and 

TABLE_IS_TRANSACTIONAL = "transactional"; as in 
AcidUtils.isTablePropertyTransactional

 

when set on Configuration were always meant for full acid and only full acid

 

I think is the part that got broken - it's now set where it shouldn't be

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-01-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344129#comment-16344129
 ] 

Eugene Koifman edited comment on HIVE-18575 at 1/29/18 10:36 PM:
-

 

hive.acid.table.scan and 

TABLE_IS_TRANSACTIONAL = "transactional"; as in 
AcidUtils.isTablePropertyTransactional

 when set on Configuration were always meant for full acid and only full acid

The code running on the Grid uses these to know it's doing acid

 

I think is the part that got broken - it's now set where it shouldn't be


was (Author: ekoifman):
 

hive.acid.table.scan and 

TABLE_IS_TRANSACTIONAL = "transactional"; as in 
AcidUtils.isTablePropertyTransactional

 

when set on Configuration were always meant for full acid and only full acid

 

I think is the part that got broken - it's now set where it shouldn't be

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18536) IOW + DP is broken for insert-only ACID

2018-01-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344354#comment-16344354
 ] 

Eugene Koifman commented on HIVE-18536:
---

I imagine AcidUtils can just as well be in common

> IOW + DP is broken for insert-only ACID
> ---
>
> Key: HIVE-18536
> URL: https://issues.apache.org/jira/browse/HIVE-18536
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18536.01.patch, HIVE-18536.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18577) SemanticAnalyzer.validate has some pointless metastore calls

2018-01-30 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18577:
--
Component/s: Transactions

> SemanticAnalyzer.validate has some pointless metastore calls
> 
>
> Key: HIVE-18577
> URL: https://issues.apache.org/jira/browse/HIVE-18577
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18577.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18589) java.io.IOException: Not enough history available

2018-01-30 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18589:
-


> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18589) java.io.IOException: Not enough history available

2018-01-30 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18589:
--
Status: Patch Available  (was: Open)

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18589.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18589) java.io.IOException: Not enough history available

2018-01-30 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18589:
--
Attachment: HIVE-18589.01.patch

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18589.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18592) DP insert on insert only table causes StatTask to fail

2018-01-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18592:
--
Component/s: Transactions

> DP insert on insert only table causes StatTask to fail
> --
>
> Key: HIVE-18592
> URL: https://issues.apache.org/jira/browse/HIVE-18592
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Zoltan Haindrich
>Priority: Major
>
> can be reproduced with:
> {code}
> set hive.mapred.mode=nonstrict;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.create.as.insert.only=true;
> set metastore.create.as.acid=true;
> drop table if exists student;
> create table student(
> name string,
> age int,
> gpa double);
> insert into student values
> ('asd',1,2),
> ('asdx',2,3),
> ('asdx',2,3),
> ('asdx',3,3),
> ('asdx',3,3),
> ('asdx',3,3);
> create table p1 (name STRING, GPA DOUBLE) PARTITIONED BY (age INT);
> SET hive.exec.dynamic.partition.mode=nonstrict;
> INSERT OVERWRITE TABLE p1 PARTITION (age) SELECT name, gpa, age FROM student;
> {code}
> causes exception
> {code}
> 2018-01-31T02:16:24,135 ERROR [22bd4065-6e2f-4f4c-8f29-8d6aad8edda8 main] 
> exec.StatsTask: Failed to run stats task
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Partition for which stats is gathered doesn't 
> exist.)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4295)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:180)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:84)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:108) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> ...
> Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: 
> Partition for which stats is gathered doesn't exist.
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:7757)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_151]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_151]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_151]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at com.sun.proxy.$Proxy38.updatePartitionColumnStatistics(Unknown 
> Source) ~[?:?]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartitonColStats(HiveMetaStore.java:5394)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:6907)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_151]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_151]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_151]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at com.sun.proxy.$Proxy40.set_aggr_stats_for(Unknown Source) ~[?:?]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:1736)
>  ~[hive-standalone-metastore-3.0.0-SNAPSHOT.jar:3.0.0-SNA

[jira] [Commented] (HIVE-18536) IOW + DP is broken for insert-only ACID

2018-01-31 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347577#comment-16347577
 ] 

Eugene Koifman commented on HIVE-18536:
---

I was asking about "Boolean baseDir" param on RB.  why is it needed?

> IOW + DP is broken for insert-only ACID
> ---
>
> Key: HIVE-18536
> URL: https://issues.apache.org/jira/browse/HIVE-18536
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18536.01.patch, HIVE-18536.02.patch, 
> HIVE-18536.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18536) IOW + DP is broken for insert-only ACID

2018-01-31 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347577#comment-16347577
 ] 

Eugene Koifman edited comment on HIVE-18536 at 1/31/18 8:39 PM:


I was asking about "Boolean baseDir" param on RB.  why is it needed?  I don't 
understand your answer


was (Author: ekoifman):
I was asking about "Boolean baseDir" param on RB.  why is it needed?

> IOW + DP is broken for insert-only ACID
> ---
>
> Key: HIVE-18536
> URL: https://issues.apache.org/jira/browse/HIVE-18536
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18536.01.patch, HIVE-18536.02.patch, 
> HIVE-18536.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18536) IOW + DP is broken for insert-only ACID

2018-01-31 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347618#comment-16347618
 ] 

Eugene Koifman commented on HIVE-18536:
---

+1

> IOW + DP is broken for insert-only ACID
> ---
>
> Key: HIVE-18536
> URL: https://issues.apache.org/jira/browse/HIVE-18536
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18536.01.patch, HIVE-18536.02.patch, 
> HIVE-18536.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18589) java.io.IOException: Not enough history available

2018-01-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18589:
--
Description: 
txnid:17 starts reading T2/P1

txnid:20 does insert overwrite T1/P1, creates base_20, commits.  txnid:17 still 
running

txnid:21 stars reading T1/P1.  It's ValidTxnList will txnid:17 as open.

before Insert overwrite was supported, only the compactor could produce base_20 
by running major compaction.  Major compaction erases history and so a reader 
with txnid:17 open, can't use base_20.

Normally, the Cleaner is smart enough to not clean pre-compaction files if it's 
possible that there is a reader that requires them.  There is a safety check 
that creates "Not enough history.." error if it finds that the current reader 
can't properly execute based on the files available.

 

with the introduction of IOW on acid tables, there is another way to produce a 
base.  The difference is that here, the base has no history by definition and 
so the same check is not needed but is triggered in the scenario above.

 

 

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18589.01.patch
>
>
> txnid:17 starts reading T2/P1
> txnid:20 does insert overwrite T1/P1, creates base_20, commits.  txnid:17 
> still running
> txnid:21 stars reading T1/P1.  It's ValidTxnList will txnid:17 as open.
> before Insert overwrite was supported, only the compactor could produce 
> base_20 by running major compaction.  Major compaction erases history and so 
> a reader with txnid:17 open, can't use base_20.
> Normally, the Cleaner is smart enough to not clean pre-compaction files if 
> it's possible that there is a reader that requires them.  There is a safety 
> check that creates "Not enough history.." error if it finds that the current 
> reader can't properly execute based on the files available.
>  
> with the introduction of IOW on acid tables, there is another way to produce 
> a base.  The difference is that here, the base has no history by definition 
> and so the same check is not needed but is triggered in the scenario above.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18589) java.io.IOException: Not enough history available

2018-01-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18589:
--
Attachment: HIVE-18589.02.patch

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18589.01.patch, HIVE-18589.02.patch
>
>
> txnid:17 starts reading T2/P1
> txnid:20 does insert overwrite T1/P1, creates base_20, commits.  txnid:17 
> still running
> txnid:21 stars reading T1/P1.  It's ValidTxnList will txnid:17 as open.
> before Insert overwrite was supported, only the compactor could produce 
> base_20 by running major compaction.  Major compaction erases history and so 
> a reader with txnid:17 open, can't use base_20.
> Normally, the Cleaner is smart enough to not clean pre-compaction files if 
> it's possible that there is a reader that requires them.  There is a safety 
> check that creates "Not enough history.." error if it finds that the current 
> reader can't properly execute based on the files available.
>  
> with the introduction of IOW on acid tables, there is another way to produce 
> a base.  The difference is that here, the base has no history by definition 
> and so the same check is not needed but is triggered in the scenario above.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18589) java.io.IOException: Not enough history available

2018-02-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348899#comment-16348899
 ] 

Eugene Koifman commented on HIVE-18589:
---

no related failures

[~gopalv] could you review please

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-18589.01.patch, HIVE-18589.02.patch
>
>
> txnid:17 starts reading T2/P1
> txnid:20 does insert overwrite T1/P1, creates base_20, commits.  txnid:17 
> still running
> txnid:21 stars reading T1/P1.  It's ValidTxnList will txnid:17 as open.
> before Insert overwrite was supported, only the compactor could produce 
> base_20 by running major compaction.  Major compaction erases history and so 
> a reader with txnid:17 open, can't use base_20.
> Normally, the Cleaner is smart enough to not clean pre-compaction files if 
> it's possible that there is a reader that requires them.  There is a safety 
> check that creates "Not enough history.." error if it finds that the current 
> reader can't properly execute based on the files available.
>  
> with the introduction of IOW on acid tables, there is another way to produce 
> a base.  The difference is that here, the base has no history by definition 
> and so the same check is not needed but is triggered in the scenario above.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18606:
-

Assignee: Eugene Koifman

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Component/s: Transactions

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Attachment: HIVE-18606.01.patch

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Status: Patch Available  (was: Open)

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349627#comment-16349627
 ] 

Eugene Koifman commented on HIVE-18606:
---

[~sershe] could you review please

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Attachment: HIVE-18606.02.patch

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349633#comment-16349633
 ] 

Eugene Koifman commented on HIVE-18606:
---

yes, you are right.  fixed in patch2

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Attachment: HIVE-18606.03.patch

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch, 
> HIVE-18606.03.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables

2018-02-02 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350945#comment-16350945
 ] 

Eugene Koifman commented on HIVE-18516:
---

+1

> load data should rename files consistent with insert statements for ACID 
> Tables
> ---
>
> Key: HIVE-18516
> URL: https://issues.apache.org/jira/browse/HIVE-18516
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, 
> HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, 
> HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, 
> HIVE-18516.8.patch, HIVE-18516.9.patch
>
>
> h1. load data should rename files consistent with insert statements for ACID 
> Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
 Release Note: N/A
   Status: Resolved  (was: Patch Available)

pushed to master

Thanks Sergey for the review

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch, 
> HIVE-18606.03.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18221) test acid default

2018-02-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18221:
--
Status: Open  (was: Patch Available)

> test acid default
> -
>
> Key: HIVE-18221
> URL: https://issues.apache.org/jira/browse/HIVE-18221
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18221.01.patch, HIVE-18221.02.patch, 
> HIVE-18221.03.patch, HIVE-18221.04.patch, HIVE-18221.07.patch, 
> HIVE-18221.08.patch, HIVE-18221.09.patch, HIVE-18221.10.patch, 
> HIVE-18221.11.patch, HIVE-18221.12.patch, HIVE-18221.13.patch, 
> HIVE-18221.14.patch, HIVE-18221.16.patch, HIVE-18221.18.patch, 
> HIVE-18221.19.patch, HIVE-18221.20.patch, HIVE-18221.21.patch, 
> HIVE-18221.22.patch, HIVE-18221.23.patch, HIVE-18221.24.patch, 
> HIVE-18221.26.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-18125) Support arbitrary file names in input to Load Data

2018-02-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-18125.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

fixed via HIVE-18516

> Support arbitrary file names in input to Load Data
> --
>
> Key: HIVE-18125
> URL: https://issues.apache.org/jira/browse/HIVE-18125
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18125.01.patch, HIVE-18125.02.patch
>
>
> HIVE-17361 only allows 0_0 and _0_copy_1.  Should it support 
> arbitrary names?
> If so, should it sort them and rename _0, 0001_0, etc?
> This is probably a lot easier than changing the whole code base to assign 
> proper 'bucket' (writerId) everywhere Acid reads such file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18589) java.io.IOException: Not enough history available

2018-02-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18589:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Gopal for the review

> java.io.IOException: Not enough history available
> -
>
> Key: HIVE-18589
> URL: https://issues.apache.org/jira/browse/HIVE-18589
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18589.01.patch, HIVE-18589.02.patch
>
>
> txnid:17 starts reading T2/P1
> txnid:20 does insert overwrite T1/P1, creates base_20, commits.  txnid:17 
> still running
> txnid:21 stars reading T1/P1.  It's ValidTxnList will txnid:17 as open.
> before Insert overwrite was supported, only the compactor could produce 
> base_20 by running major compaction.  Major compaction erases history and so 
> a reader with txnid:17 open, can't use base_20.
> Normally, the Cleaner is smart enough to not clean pre-compaction files if 
> it's possible that there is a reader that requires them.  There is a safety 
> check that creates "Not enough history.." error if it finds that the current 
> reader can't properly execute based on the files available.
>  
> with the introduction of IOW on acid tables, there is another way to produce 
> a base.  The difference is that here, the base has no history by definition 
> and so the same check is not needed but is triggered in the scenario above.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-05 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18575:
--
Component/s: Transactions

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18128) Setting AcidUtils.setTransactionalTableScan in HiveInputFormat causes downstream errors

2018-02-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352746#comment-16352746
 ] 

Eugene Koifman commented on HIVE-18128:
---

fixed in HIVE-18575?

> Setting AcidUtils.setTransactionalTableScan in HiveInputFormat causes 
> downstream errors
> ---
>
> Key: HIVE-18128
> URL: https://issues.apache.org/jira/browse/HIVE-18128
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18128.WIP.patch
>
>
> This should really be set in addSplitsForGroup().  See attached patch for 
> details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352769#comment-16352769
 ] 

Eugene Koifman commented on HIVE-18575:
---

I left some comments on RB.  Mostly, I think the change in 

TableScanDesc()  doesn't do the right thing.

 

Also, the test for full acid is sometimes called isAcidTable() and sometimes 
isFullAcidTable() - either is fine but I think it's confusing to have both.

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16605) Enforce NOT NULL constraints

2018-02-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352863#comment-16352863
 ] 

Eugene Koifman commented on HIVE-16605:
---

merge_constraint_notnull.q sets hive.merge.cardinality.check=false

is that intentional?  if so why?

> Enforce NOT NULL constraints
> 
>
> Key: HIVE-16605
> URL: https://issues.apache.org/jira/browse/HIVE-16605
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-16605.1.patch, HIVE-16605.2.patch, 
> HIVE-16605.3.patch, HIVE-16605.4.patch
>
>
> Since NOT NULL is so common it would be great to have tables start to enforce 
> that.
> [~ekoifman] described a possible approach in HIVE-16575:
> {quote}
> One way to enforce not null constraint is to have the optimizer add 
> enforce_not_null UDF which throws if it sees a NULL, otherwise it's pass 
> through.
> So if 'b' has not null constraint,
> Insert into T select a,b,c... would become
> Insert into T select a, enforce_not_null(b), c.
> This would work for any table type.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354201#comment-16354201
 ] 

Eugene Koifman commented on HIVE-18575:
---

what was the point of renaming isAcid to isFullAcid?  {{isAcid}} was used 
everywhere to mean full acid table.

it now leads to code like 
"compBuilder.setIsAcid(AcidUtils.isTransactionalTable(t));" which will cause 
confusion?

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.01.patch, HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18636) fix TestTxnNoBuckets.testCTAS - keeps failing on ptest

2018-02-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18636:
-

Assignee: Eugene Koifman

> fix TestTxnNoBuckets.testCTAS - keeps failing on ptest
> --
>
> Key: HIVE-18636
> URL: https://issues.apache.org/jira/browse/HIVE-18636
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> need to update expected result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18636) fix TestTxnNoBuckets.testCTAS - keeps failing on ptest

2018-02-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18636:
--
Priority: Blocker  (was: Major)

> fix TestTxnNoBuckets.testCTAS - keeps failing on ptest
> --
>
> Key: HIVE-18636
> URL: https://issues.apache.org/jira/browse/HIVE-18636
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> need to update expected result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18643) don't check for archived partitions for ACID ops

2018-02-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354840#comment-16354840
 ] 

Eugene Koifman commented on HIVE-18643:
---

+1

> don't check for archived partitions for ACID ops
> 
>
> Key: HIVE-18643
> URL: https://issues.apache.org/jira/browse/HIVE-18643
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18643.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18636) fix TestTxnNoBuckets.testCTAS - keeps failing on ptest

2018-02-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18636:
--
Attachment: HIVE-18636.01.patch

> fix TestTxnNoBuckets.testCTAS - keeps failing on ptest
> --
>
> Key: HIVE-18636
> URL: https://issues.apache.org/jira/browse/HIVE-18636
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-18636.01.patch
>
>
> need to update expected result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18636) fix TestTxnNoBuckets.testCTAS - keeps failing on ptest

2018-02-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18636:
--
Status: Patch Available  (was: Open)

[~sershe] could you review please

> fix TestTxnNoBuckets.testCTAS - keeps failing on ptest
> --
>
> Key: HIVE-18636
> URL: https://issues.apache.org/jira/browse/HIVE-18636
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-18636.01.patch
>
>
> need to update expected result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355929#comment-16355929
 ] 

Eugene Koifman commented on HIVE-18575:
---

made some comments on RB

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.01.patch, HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18575:
--
Comment: was deleted

(was: what was the point of renaming isAcid to isFullAcid?  {{isAcid}} was used 
everywhere to mean full acid table.

it now leads to code like 
"compBuilder.setIsAcid(AcidUtils.isTransactionalTable(t));" which will cause 
confusion?)

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.01.patch, HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18619) Verification of temporary Micromanaged table atomicity is needed

2018-02-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355950#comment-16355950
 ] 

Eugene Koifman commented on HIVE-18619:
---

to test this in UT you can use HiveConf.HIVETESTMODEROLLBACKTXN

after you set it, commit will fail - it's used simulate failures

> Verification of temporary Micromanaged table atomicity is needed 
> -
>
> Key: HIVE-18619
> URL: https://issues.apache.org/jira/browse/HIVE-18619
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
>
> Session based temporary table by HIVE-7090 had no consideration of 
> Micromanaged table 
> (MM) since there was no insert-only ACID table at its creation tije. 
> HIVE-18599 addressed the issue of no writes during CTTAS (Create Temporary 
> Table As Select)
> on Micro-Managed table. But atomicity of temporary MM table is not verified. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18649) HiveInputFormat job conf object lifecycle is unclear or broken

2018-02-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18649:
--
Component/s: Transactions

> HiveInputFormat job conf object lifecycle is unclear or broken
> --
>
> Key: HIVE-18649
> URL: https://issues.apache.org/jira/browse/HIVE-18649
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Follow-up from  HIVE-18575
> ACID properties may be added to the same job object for multiple tables, at 
> least by the looks of it.
> There also exists a JobConf field "job" in HIF; and a separate JobConf input 
> argument to some methods. These methods apply some changes to one jobconf and 
> some to another, for no clear reason.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18636) fix TestTxnNoBuckets.testCTAS - keeps failing on ptest

2018-02-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18636:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to master

thanks Sergey for the review

> fix TestTxnNoBuckets.testCTAS - keeps failing on ptest
> --
>
> Key: HIVE-18636
> URL: https://issues.apache.org/jira/browse/HIVE-18636
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-18636.01.patch
>
>
> need to update expected result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17482) External LLAP client: acquire locks for tables queried directly by LLAP

2017-09-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184847#comment-16184847
 ] 

Eugene Koifman commented on HIVE-17482:
---

+1

> External LLAP client: acquire locks for tables queried directly by LLAP
> ---
>
> Key: HIVE-17482
> URL: https://issues.apache.org/jira/browse/HIVE-17482
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17482.1.patch, HIVE-17482.2.patch, 
> HIVE-17482.3.patch, HIVE-17482.4.patch, HIVE-17482.5.patch, HIVE-17482.6.patch
>
>
> When using the LLAP external client with simple queries (filter/project of 
> single table), the appropriate locks should be taken on the table being read 
> like they are for normal Hive queries. This is important in the case of 
> transactional tables being queried, since the compactor relies on the 
> presence of table locks to determine whether it can safely delete old 
> versions of compacted files without affecting currently running queries.
> This does not have to happen in the complex query case, since a query is used 
> (with the appropriate locking mechanisms) to create/populate the temp table 
> holding the results to the complex query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17645) MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)

2017-09-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17645:
--
Summary: MM tables patch conflicts with HIVE-17482 (Spark/Acid integration) 
 (was: This conflicts with HIVE-17482 (Spark/Acid integration))

> MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)
> --
>
> Key: HIVE-17645
> URL: https://issues.apache.org/jira/browse/HIVE-17645
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>
> MM code introduces 
> {noformat}
> HiveTxnManager txnManager = SessionState.get().getTxnMgr()
> {noformat}
> in a number of places.  HIVE-17482 adds a mode where a TransactionManager not 
> associated with the session should be used.  This will need to be addressed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17645) MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)

2017-09-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17645:
--
Description: 
MM code introduces 
{noformat}
HiveTxnManager txnManager = SessionState.get().getTxnMgr()
{noformat}

in a number of places (e.g _DDLTask.generateAddMmTasks(Table tbl)_).  
HIVE-17482 adds a mode where a TransactionManager not associated with the 
session should be used.  This will need to be addressed.

  was:
MM code introduces 
{noformat}
HiveTxnManager txnManager = SessionState.get().getTxnMgr()
{noformat}

in a number of places.  HIVE-17482 adds a mode where a TransactionManager not 
associated with the session should be used.  This will need to be addressed.


> MM tables patch conflicts with HIVE-17482 (Spark/Acid integration)
> --
>
> Key: HIVE-17645
> URL: https://issues.apache.org/jira/browse/HIVE-17645
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>
> MM code introduces 
> {noformat}
> HiveTxnManager txnManager = SessionState.get().getTxnMgr()
> {noformat}
> in a number of places (e.g _DDLTask.generateAddMmTasks(Table tbl)_).  
> HIVE-17482 adds a mode where a TransactionManager not associated with the 
> session should be used.  This will need to be addressed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17658) Bucketed/Sorted tables - SMB join

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17658:
--
Summary: Bucketed/Sorted tables - SMB join  (was: Bucketed/Sorted tables)

> Bucketed/Sorted tables - SMB join
> -
>
> Key: HIVE-17658
> URL: https://issues.apache.org/jira/browse/HIVE-17658
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> How does this handle tables that are bucketed + sorted?
> insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
> insert into T values(3,4),(7,8) creates delta_3_3/bucket_1
> the expectation for any reader would be to see some contiguous subset of 
> (1,2),(3,4),(5,6),(7,8)
> but this would require a special reader which I don't see



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17658) Bucketed/Sorted tables - SMB join

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17658:
--
Description: 
How does this handle tables that are bucketed + sorted?
insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
insert into T values(3,4),(7,8) creates delta_3_3/bucket_1

the expectation for any reader would be to see some contiguous subset of 
(1,2),(3,4),(5,6),(7,8)

but this would require a special reader which I don't see

In particular it's not clear how SMB join can work


  was:
How does this handle tables that are bucketed + sorted?
insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
insert into T values(3,4),(7,8) creates delta_3_3/bucket_1

the expectation for any reader would be to see some contiguous subset of 
(1,2),(3,4),(5,6),(7,8)

but this would require a special reader which I don't see



> Bucketed/Sorted tables - SMB join
> -
>
> Key: HIVE-17658
> URL: https://issues.apache.org/jira/browse/HIVE-17658
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> How does this handle tables that are bucketed + sorted?
> insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
> insert into T values(3,4),(7,8) creates delta_3_3/bucket_1
> the expectation for any reader would be to see some contiguous subset of 
> (1,2),(3,4),(5,6),(7,8)
> but this would require a special reader which I don't see
> In particular it's not clear how SMB join can work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Priority: Critical  (was: Major)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17361) Support LOAD DATA for transactional tables

2017-09-29 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186760#comment-16186760
 ] 

Eugene Koifman commented on HIVE-17361:
---

see mm_loaddata.q for various examples - in particular it's possible to load 
multiple files at once

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16850) Only open a new transaction when there's no currently opened transaction

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16850:
--
Component/s: Transactions

> Only open a new transaction when there's no currently opened transaction
> 
>
> Key: HIVE-16850
> URL: https://issues.apache.org/jira/browse/HIVE-16850
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Wei Zheng
> Fix For: hive-14535
>
> Attachments: HIVE-16850.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-16850) Only open a new transaction when there's no currently opened transaction

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-16850:
---
  Assignee: (was: Wei Zheng)

This cannot be right.  It should throw if there is no open transaction

> Only open a new transaction when there's no currently opened transaction
> 
>
> Key: HIVE-16850
> URL: https://issues.apache.org/jira/browse/HIVE-16850
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Wei Zheng
> Fix For: hive-14535
>
> Attachments: HIVE-16850.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17657) Does ExIm work?

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17657:
--
Description: 
there is mm_exim.q but it's not clear from the tests what file structure it 
creates 
On import the txnids in the directory names would have to be remapped if 
importing to a different cluster.  Perhaps export can be smart and export 
highest base_x and accretive deltas (minus aborted ones).  Then import can ...? 
 It would have to remap txn ids from the archive to new txn ids.  This would 
then mean that import is made up of several transactions rather than 1 atomic 
op.  (all locks must belong to a transaction)

One possibility is to open a new txn for each dir in the archive (where 
start/end txn of file name is the same) and commit all of them at once (need 
new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
allows other inserts (not related to import) to occur.
What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
that this must mean that there is no delta_6_6 or any other "obsolete" delta in 
the archive we can map it to a new single txn delta_x_x.

  was:
there is mm_exim.q but it's not clear from the tests what file structure it 
creates 
On import the txnids in the directory names would have to be remapped if 
importing to a different cluster.  Perhaps export can be smart and export 
highest base_x and accretive deltas (minus aborted ones).  Then import can ...? 
 It would have to remap txn ids from the archive to new txn ids.  This would 
then mean that import is made up of several transactions rather than 1 atomic 
op.  (all locks must belong to a transaction)


> Does ExIm work?
> ---
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17657) Does ExIm work?

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17657:
--
Description: 
there is mm_exim.q but it's not clear from the tests what file structure it 
creates 
On import the txnids in the directory names would have to be remapped if 
importing to a different cluster.  Perhaps export can be smart and export 
highest base_x and accretive deltas (minus aborted ones).  Then import can ...? 
 It would have to remap txn ids from the archive to new txn ids.  This would 
then mean that import is made up of several transactions rather than 1 atomic 
op.  (all locks must belong to a transaction)

One possibility is to open a new txn for each dir in the archive (where 
start/end txn of file name is the same) and commit all of them at once (need 
new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
allows other inserts (not related to import) to occur.
What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
that this must mean that there is no delta_6_6 or any other "obsolete" delta in 
the archive we can map it to a new single txn delta_x_x.

Add read_only mode for tables (useful in general, may be needed for upgrade 
etc) and use that to make the above atomic.

  was:
there is mm_exim.q but it's not clear from the tests what file structure it 
creates 
On import the txnids in the directory names would have to be remapped if 
importing to a different cluster.  Perhaps export can be smart and export 
highest base_x and accretive deltas (minus aborted ones).  Then import can ...? 
 It would have to remap txn ids from the archive to new txn ids.  This would 
then mean that import is made up of several transactions rather than 1 atomic 
op.  (all locks must belong to a transaction)

One possibility is to open a new txn for each dir in the archive (where 
start/end txn of file name is the same) and commit all of them at once (need 
new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
allows other inserts (not related to import) to occur.
What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
that this must mean that there is no delta_6_6 or any other "obsolete" delta in 
the archive we can map it to a new single txn delta_x_x.


> Does ExIm work?
> ---
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-12375) ensure hive.compactor.check.interval cannot be set too low

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-12375:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

> ensure hive.compactor.check.interval cannot be set too low
> --
>
> Key: HIVE-12375
> URL: https://issues.apache.org/jira/browse/HIVE-12375
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Steve Yeom
> Attachments: HIVE-12375.2.patch, HIVE-12375.3.patch, 
> HIVE-12375.4.patch, HIVE-12375.patch
>
>
> hive.compactor.check.interval can currently be set to as low as 0, which 
> makes Initiator spin needlessly feeling up logs, etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17657) Does ExIm for MM tables work?

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17657:
--
Summary: Does ExIm for MM tables work?  (was: Does ExIm work?)

> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17547) MoveTask for Acid tables race condition

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17547:
--
Description: 
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
  └── -ext-1
│   └── 00_0
│   ├── _orc_acid_version
│   └── delta_019_019
│   └── bucket_0
│   └── 00_1
│   ├── _orc_acid_version
│   └── delta_019_019
│   └── bucket_1
{noformat}
for a write to a bucketed table.
The "move" handles each 00_N separately.  The first on creates 
delta_019_019 under the table/partition dir, the others just add 
bucket_N there.
That means there is a small window where someone may "ls 
table/part/delta_019_019" and not see all the buckets.

Once Acid writes directly to the final location (a la MM tables) this issue 
resolves automatically since txn 19 is uncommitted until everything is written.

  was:
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
  └── -ext-1
│   └── 00_0
│   ├── _orc_acid_version
│   └── delta_019_019
│   └── bucket_0
│   └── 00_1
│   ├── _orc_acid_version
│   └── delta_019_019
│   └── bucket_1
{noformat}
for a write to a bucketed table.
The "move" handles each 00_N separately.  The first on creates 
delta_019_019 under the table/partition dir, the others just add 
bucket_N there.
That means there is a small window where someone may "ls 
table/part/delta_019_019" and not see all the buckets.


> MoveTask for Acid tables race condition
> ---
>
> Key: HIVE-17547
> URL: https://issues.apache.org/jira/browse/HIVE-17547
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> Consider Hive.moveAcidFiles()
> it starts out with something like
> {noformat}
>   └── -ext-1
> │   └── 00_0
> │   ├── _orc_acid_version
> │   └── delta_019_019
> │   └── bucket_0
> │   └── 00_1
> │   ├── _orc_acid_version
> │   └── delta_019_019
> │   └── bucket_1
> {noformat}
> for a write to a bucketed table.
> The "move" handles each 00_N separately.  The first on creates 
> delta_019_019 under the table/partition dir, the others just add 
> bucket_N there.
> That means there is a small window where someone may "ls 
> table/part/delta_019_019" and not see all the buckets.
> Once Acid writes directly to the final location (a la MM tables) this issue 
> resolves automatically since txn 19 is uncommitted until everything is 
> written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17391) Compaction fails if there is an empty value in tblproperties

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17391:
-

Assignee: Eugene Koifman

> Compaction fails if there is an empty value in tblproperties
> 
>
> Key: HIVE-17391
> URL: https://issues.apache.org/jira/browse/HIVE-17391
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Eugene Koifman
>
> create table t1 (a int) tblproperties ('serialization.null.format'='');
> alter table t1 compact 'major';
> fails



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17391) Compaction fails if there is an empty value in tblproperties

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17391:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

> Compaction fails if there is an empty value in tblproperties
> 
>
> Key: HIVE-17391
> URL: https://issues.apache.org/jira/browse/HIVE-17391
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Steve Yeom
>
> create table t1 (a int) tblproperties ('serialization.null.format'='');
> alter table t1 compact 'major';
> fails



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17391) Compaction fails if there is an empty value in tblproperties

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17391:
-

Assignee: Steve Yeom  (was: Steve Yeom)

> Compaction fails if there is an empty value in tblproperties
> 
>
> Key: HIVE-17391
> URL: https://issues.apache.org/jira/browse/HIVE-17391
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Ashutosh Chauhan
>Assignee: Steve Yeom
>
> create table t1 (a int) tblproperties ('serialization.null.format'='');
> alter table t1 compact 'major';
> fails



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17349) Enforce 1 TransactionBatch per StreamingConnection

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17349:
-

Assignee: Eugene Koifman

> Enforce 1 TransactionBatch per StreamingConnection
> --
>
> Key: HIVE-17349
> URL: https://issues.apache.org/jira/browse/HIVE-17349
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> There is a comment on _hcatalog.streaming.StreamingConnection_
> {noformat}
> Note: the expectation is that there is at most 1 TransactionBatch outstanding 
> for any given
>  * StreamingConnection.  Violating this may result in "out of sequence 
> response".
> {noformat}
> Why not enforce this in the code...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17349) Enforce 1 TransactionBatch per StreamingConnection

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17349:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

> Enforce 1 TransactionBatch per StreamingConnection
> --
>
> Key: HIVE-17349
> URL: https://issues.apache.org/jira/browse/HIVE-17349
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Reporter: Eugene Koifman
>Assignee: Steve Yeom
>Priority: Minor
>
> There is a comment on _hcatalog.streaming.StreamingConnection_
> {noformat}
> Note: the expectation is that there is at most 1 TransactionBatch outstanding 
> for any given
>  * StreamingConnection.  Violating this may result in "out of sequence 
> response".
> {noformat}
> Why not enforce this in the code...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17306) Support MySQL InnoDB Cluster

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17306:
-

Assignee: Eugene Koifman

> Support MySQL InnoDB Cluster
> 
>
> Key: HIVE-17306
> URL: https://issues.apache.org/jira/browse/HIVE-17306
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Shawn Weeks
>Assignee: Eugene Koifman
>Priority: Minor
>
> To support high availability of the Hive Metastore using a highly available 
> database is required. To support the MySQL InnoDB Cluster it looks like we're 
> just missing a couple primary keys as we were already using InnoDB tables for 
> the metastore. It looks like it's primarily the transaction tables that don't 
> have primary keys like TXN_COMPONENTS and COMPLETED_TXN_COMPONENTS. The 
> primary keys can be surrogate sequences if there really is no unique 
> identifier in these tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17306) Support MySQL InnoDB Cluster

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17306:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

> Support MySQL InnoDB Cluster
> 
>
> Key: HIVE-17306
> URL: https://issues.apache.org/jira/browse/HIVE-17306
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Shawn Weeks
>Assignee: Steve Yeom
>Priority: Minor
>
> To support high availability of the Hive Metastore using a highly available 
> database is required. To support the MySQL InnoDB Cluster it looks like we're 
> just missing a couple primary keys as we were already using InnoDB tables for 
> the metastore. It looks like it's primarily the transaction tables that don't 
> have primary keys like TXN_COMPONENTS and COMPLETED_TXN_COMPONENTS. The 
> primary keys can be surrogate sequences if there really is no unique 
> identifier in these tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17232) "No match found" Compactor finds a bucket file thinking it's a directory

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17232:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

>  "No match found"  Compactor finds a bucket file thinking it's a directory
> --
>
> Key: HIVE-17232
> URL: https://issues.apache.org/jira/browse/HIVE-17232
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Steve Yeom
>
> {noformat}
> 2017-08-02T12:38:11,996  WARN [main] compactor.CompactorMR: Found a 
> non-bucket file that we thought matched the bucket pattern! 
> file:/Users/ekoifman/dev/hiv\
> erwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands2-1501702264311/warehouse/acidtblpart/p=1/delta_013_013_/bucket_1
>  Matcher=java\
> .util.regex.Matcher[pattern=^[0-9]{6} region=0,12 lastmatch=]
> 2017-08-02T12:38:11,996  INFO [main] mapreduce.JobSubmitter: Cleaning up the 
> staging area 
> file:/tmp/hadoop/mapred/staging/ekoifman1723152463/.staging/job_lo\
> cal1723152463_0183
> 2017-08-02T12:38:11,997 ERROR [main] compactor.Worker: Caught exception while 
> trying to compact 
> id:1,dbname:default,tableName:ACIDTBLPART,partName:null,stat\
> e:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.lang.IllegalStateException: 
> \
> No match found
> at java.util.regex.Matcher.group(Matcher.java:536)
> at java.util.regex.Matcher.group(Matcher.java:496)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.addFileToMap(CompactorMR.java:577)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorInputFormat.getSplits(CompactorMR.java:549)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:330)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:320)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:275)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:166)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runWorker(TestTxnCommands2.java:1138)
> at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.updateDeletePartitioned(TestTxnCommands2.java:894)
> {noformat}
> the stack trace points to 1st runWorker() in updateDeletePartitioned() though 
> the test run was TestTxnCommands2WithSplitUpdateAndVectorization



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16964) _orc_acid_version file is missing

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-16964:
-

Assignee: Steve Yeom  (was: Eugene Koifman)

> _orc_acid_version file is missing
> -
>
> Key: HIVE-16964
> URL: https://issues.apache.org/jira/browse/HIVE-16964
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Steve Yeom
>
> OrcRecordUpdater creates OrcRecordUpdater.ACID_FORMAT in the dir that it 
> creates - but there is nothing Hive.moveAcidFiles() that copies it final 
> location.
> It doesn't look like CompactorMR even attempts to create it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16361) Automatically kill runaway client processes

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16361:
--
Description: 
HIVE-13249 added an enforceable limit on how many transactions can be opened 
concurrently where the system starts to reject new work to prevent the system 
getting to a point where it cannot manage the load.

Another condition to guard against is a runaway process (which would usually be 
some app (e.g. Storm) using Streaming Ingest API) that create a very large 
number of transactions very quickly all of which immediately get aborted due to 
some misconfiguration.  This can cause large amount of metatdata to accumulate 
in the ACID system slowing everything down and causing instability.


Now that we have TXNS.TXN_AGENT_INFO information we could probably use that to 
refuse work from a client even before we open any txns if it passes some 
"runaway client" heuristic.

This is like an unintentional DOS attack

  was:
HIVE-13249 added an enforceable limit on how many transactions can be opened 
concurrently where the system starts to reject new work to prevent the system 
getting to a point where it cannot manage the load.

Another condition to guard against is a runaway process (which would usually be 
some app (e.g. Storm) using Streaming Ingest API) that create a very large 
number of transactions very quickly all of which immediately get aborted due to 
some misconfiguration.  This can cause large amount of metatdata to accumulate 
in the ACID system slowing everything down and causing instability.


Now that we have TXNS.TXN_AGENT_INFO information we could probably use that 
refuse work from a client even before we open any txns if it passes some 
"runaway client" heuristic.

This is like an unintentional DOS attack


> Automatically kill runaway client processes 
> 
>
> Key: HIVE-16361
> URL: https://issues.apache.org/jira/browse/HIVE-16361
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Priority: Critical
>
> HIVE-13249 added an enforceable limit on how many transactions can be opened 
> concurrently where the system starts to reject new work to prevent the system 
> getting to a point where it cannot manage the load.
> Another condition to guard against is a runaway process (which would usually 
> be some app (e.g. Storm) using Streaming Ingest API) that create a very large 
> number of transactions very quickly all of which immediately get aborted due 
> to some misconfiguration.  This can cause large amount of metatdata to 
> accumulate in the ACID system slowing everything down and causing instability.
> Now that we have TXNS.TXN_AGENT_INFO information we could probably use that 
> to refuse work from a client even before we open any txns if it passes some 
> "runaway client" heuristic.
> This is like an unintentional DOS attack



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-15967) Add test for Add Partition with data to Acid table

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15967:
-

Assignee: Steve Yeom

> Add test for Add Partition with data to Acid table
> --
>
> Key: HIVE-15967
> URL: https://issues.apache.org/jira/browse/HIVE-15967
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Steve Yeom
>
> This should in principle work as long as the partition is properly bucketed 
> and uses ORC.  Non-acid to acid conversion (in compaction) should just handle 
> it.
> ORC Schema evolution should handle any missing columns (and ignore extra 
> ones) wrt table schema.
> I doubt there are any checks in place to check compatibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-15267) Make query length calculation logic more accurate in TxnUtils.needNewQuery()

2017-10-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15267:
-

Assignee: Steve Yeom  (was: Wei Zheng)

> Make query length calculation logic more accurate in TxnUtils.needNewQuery()
> 
>
> Key: HIVE-15267
> URL: https://issues.apache.org/jira/browse/HIVE-15267
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Wei Zheng
>Assignee: Steve Yeom
>
> In HIVE-15181 there's such review comment, for which this ticket will handle
> {code}
> in TxnUtils.needNewQuery() "sizeInBytes / 1024 > queryMemoryLimit" doesn't do 
> the right thing.
> If the user sets METASTORE_DIRECT_SQL_MAX_QUERY_LENGTH to 1K, they most 
> likely want each SQL string to be at most 1K.
> But if sizeInBytes=2047, this still returns false.
> It should include length of "suffix" in computation of sizeInBytes
> Along the same lines: the check for max query length is done after each batch 
> is already added to the query. Suppose there are 1000 9-digit txn IDs in each 
> IN(...). That's, conservatively, 18KB of text. So the length of each query is 
> increasing in 18KB chunks. 
> I think the check for query length should be done for each item in IN clause.
> If some DB has a limit on query length of X, then any query > X will fail. So 
> I think this must ensure not to produce any queries > X, even by 1 char.
> For example, case 3.1 of the UT generates a query of almost 4000 characters - 
> this is clearly > 1KB.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


<    7   8   9   10   11   12   13   14   15   16   >