[jira] [Work started] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3231 started by Keegan Mosley. msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Assignee: Keegan Mosley Labels: msck Fix For: 0.10.0 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3231: Attachment: HIVE-3231.2.patch.txt msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Assignee: Keegan Mosley Labels: msck Fix For: 0.10.0 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3231: Assignee: Carl Steinbach (was: Keegan Mosley) Status: Patch Available (was: In Progress) https://reviews.apache.org/r/7649/ msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Assignee: Carl Steinbach Labels: msck Fix For: 0.10.0 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries
[ https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3236: Description: When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.\*, two.\* from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. was: When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.#42, two.#42 from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. allow column names to be prefixed by table alias in select all queries -- Key: HIVE-3236 URL: https://issues.apache.org/jira/browse/HIVE-3236 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor Fix For: 0.10.0 Attachments: HIVE-3236.1.patch.txt When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.\*, two.\* from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3231) msck repair should find partitions already containing data files
Keegan Mosley created HIVE-3231: --- Summary: msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3231: Attachment: HIVE-3231.1.patch.txt msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor Labels: msck Attachments: HIVE-3231.1.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3231: Fix Version/s: 0.10.0 Status: Patch Available (was: Open) msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor Labels: msck Fix For: 0.10.0 Attachments: HIVE-3231.1.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files
[ https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3231: Priority: Major (was: Minor) msck repair should find partitions already containing data files Key: HIVE-3231 URL: https://issues.apache.org/jira/browse/HIVE-3231 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Labels: msck Fix For: 0.10.0 Attachments: HIVE-3231.1.patch.txt msck repair currently will only discover partition directories if they are empty. It seems a more apt use case to copy data files into a table, creating the partition directories as you go, rather than creating a bunch of empty partition directories, then running msck repair to dynamically add them, then inserting your actual data files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3236) allow column names to be prefixed by table alias in select all queries
Keegan Mosley created HIVE-3236: --- Summary: allow column names to be prefixed by table alias in select all queries Key: HIVE-3236 URL: https://issues.apache.org/jira/browse/HIVE-3236 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.*, two.* from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries
[ https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3236: Attachment: HIVE-3236.1.patch.txt allow column names to be prefixed by table alias in select all queries -- Key: HIVE-3236 URL: https://issues.apache.org/jira/browse/HIVE-3236 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor Fix For: 0.10.0 Attachments: HIVE-3236.1.patch.txt When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.*, two.* from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries
[ https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3236: Fix Version/s: 0.10.0 Status: Patch Available (was: Open) allow column names to be prefixed by table alias in select all queries -- Key: HIVE-3236 URL: https://issues.apache.org/jira/browse/HIVE-3236 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.10.0, 0.9.1 Reporter: Keegan Mosley Priority: Minor Fix For: 0.10.0 Attachments: HIVE-3236.1.patch.txt When using CREATE TABLE x AS SELECT ... where the select joins tables with hundreds of columns it is not a simple task to resolve duplicate column name exceptions (particularly with self-joins). The user must either manually specify aliases for all duplicate columns (potentially hundreds) or write a script to generate the data set in a separate select query, then create the table and load the data in. There should be some conf flag that would allow queries like create table joined as select one.*, two.* from mytable one join mytable two on (one.duplicate_field = two.duplicate_field1); to create a table with columns one_duplicate_field and two_duplicate_field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation
Keegan Mosley created HIVE-3095: --- Summary: Self-referencing Avro schema creates infinite loop on table creation Key: HIVE-3095 URL: https://issues.apache.org/jira/browse/HIVE-3095 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.1 Reporter: Keegan Mosley Priority: Minor An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html: create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } '); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation
[ https://issues.apache.org/jira/browse/HIVE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3095: Description: An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } ') stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; was: An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html: create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } ') stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; Self-referencing Avro schema creates infinite loop on table creation Key: HIVE-3095 URL: https://issues.apache.org/jira/browse/HIVE-3095 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.1 Reporter: Keegan Mosley Priority: Minor Labels: avro An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } ') stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation
[ https://issues.apache.org/jira/browse/HIVE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keegan Mosley updated HIVE-3095: Description: An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html: create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } ') stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; was: An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html: create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } '); Self-referencing Avro schema creates infinite loop on table creation Key: HIVE-3095 URL: https://issues.apache.org/jira/browse/HIVE-3095 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.1 Reporter: Keegan Mosley Priority: Minor Labels: avro An Avro schema which has a field reference to itself will create an infinite loop which eventually throws a StackOverflowError. To reproduce using the linked-list example from http://avro.apache.org/docs/1.6.1/spec.html: create table linkedListTest row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' with serdeproperties ('avro.schema.literal'=' { type: record, name: LongList, aliases: [LinkedLongs], // old name for this fields : [ {name: value, type: long}, // each element has a long {name: next, type: [LongList, null]} // optional next element ] } ') stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira