[jira] [Work started] (HIVE-3231) msck repair should find partitions already containing data files

2012-10-18 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3231 started by Keegan Mosley.

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Assignee: Keegan Mosley
  Labels: msck
 Fix For: 0.10.0

 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files

2012-10-18 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3231:


Attachment: HIVE-3231.2.patch.txt

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Assignee: Keegan Mosley
  Labels: msck
 Fix For: 0.10.0

 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files

2012-10-18 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3231:


Assignee: Carl Steinbach  (was: Keegan Mosley)
  Status: Patch Available  (was: In Progress)

https://reviews.apache.org/r/7649/

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Assignee: Carl Steinbach
  Labels: msck
 Fix For: 0.10.0

 Attachments: HIVE-3231.1.patch.txt, HIVE-3231.2.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2012-07-06 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3236:


Description: 
When using CREATE TABLE x AS SELECT ... where the select joins tables with 
hundreds of columns it is not a simple task to resolve duplicate column name 
exceptions (particularly with self-joins). The user must either manually 
specify aliases for all duplicate columns (potentially hundreds) or write a 
script to generate the data set in a separate select query, then create the 
table and load the data in.


There should be some conf flag that would allow queries like

create table joined as select one.\*, two.\* from mytable one join mytable two 
on (one.duplicate_field = two.duplicate_field1);

to create a table with columns one_duplicate_field and two_duplicate_field.

  was:
When using CREATE TABLE x AS SELECT ... where the select joins tables with 
hundreds of columns it is not a simple task to resolve duplicate column name 
exceptions (particularly with self-joins). The user must either manually 
specify aliases for all duplicate columns (potentially hundreds) or write a 
script to generate the data set in a separate select query, then create the 
table and load the data in.


There should be some conf flag that would allow queries like

create table joined as select one.#42, two.#42 from mytable one join mytable 
two on (one.duplicate_field = two.duplicate_field1);

to create a table with columns one_duplicate_field and two_duplicate_field.


 allow column names to be prefixed by table alias in select all queries
 --

 Key: HIVE-3236
 URL: https://issues.apache.org/jira/browse/HIVE-3236
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
 Fix For: 0.10.0

 Attachments: HIVE-3236.1.patch.txt


 When using CREATE TABLE x AS SELECT ... where the select joins tables with 
 hundreds of columns it is not a simple task to resolve duplicate column name 
 exceptions (particularly with self-joins). The user must either manually 
 specify aliases for all duplicate columns (potentially hundreds) or write a 
 script to generate the data set in a separate select query, then create the 
 table and load the data in.
 There should be some conf flag that would allow queries like
 create table joined as select one.\*, two.\* from mytable one join mytable 
 two on (one.duplicate_field = two.duplicate_field1);
 to create a table with columns one_duplicate_field and two_duplicate_field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-3231) msck repair should find partitions already containing data files

2012-07-05 Thread Keegan Mosley (JIRA)
Keegan Mosley created HIVE-3231:
---

 Summary: msck repair should find partitions already containing 
data files
 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor


msck repair currently will only discover partition directories if they are 
empty.

It seems a more apt use case to copy data files into a table, creating the 
partition directories as you go, rather than creating a bunch of empty 
partition directories, then running msck repair to dynamically add them, then 
inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files

2012-07-05 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3231:


Attachment: HIVE-3231.1.patch.txt

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
  Labels: msck
 Attachments: HIVE-3231.1.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files

2012-07-05 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3231:


Fix Version/s: 0.10.0
   Status: Patch Available  (was: Open)

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
  Labels: msck
 Fix For: 0.10.0

 Attachments: HIVE-3231.1.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3231) msck repair should find partitions already containing data files

2012-07-05 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3231:


Priority: Major  (was: Minor)

 msck repair should find partitions already containing data files
 

 Key: HIVE-3231
 URL: https://issues.apache.org/jira/browse/HIVE-3231
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
  Labels: msck
 Fix For: 0.10.0

 Attachments: HIVE-3231.1.patch.txt


 msck repair currently will only discover partition directories if they are 
 empty.
 It seems a more apt use case to copy data files into a table, creating the 
 partition directories as you go, rather than creating a bunch of empty 
 partition directories, then running msck repair to dynamically add them, then 
 inserting your actual data files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2012-07-05 Thread Keegan Mosley (JIRA)
Keegan Mosley created HIVE-3236:
---

 Summary: allow column names to be prefixed by table alias in 
select all queries
 Key: HIVE-3236
 URL: https://issues.apache.org/jira/browse/HIVE-3236
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor


When using CREATE TABLE x AS SELECT ... where the select joins tables with 
hundreds of columns it is not a simple task to resolve duplicate column name 
exceptions (particularly with self-joins). The user must either manually 
specify aliases for all duplicate columns (potentially hundreds) or write a 
script to generate the data set in a separate select query, then create the 
table and load the data in.


There should be some conf flag that would allow queries like

create table joined as select one.*, two.* from mytable one join mytable two 
on (one.duplicate_field = two.duplicate_field1);

to create a table with columns one_duplicate_field and two_duplicate_field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2012-07-05 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3236:


Attachment: HIVE-3236.1.patch.txt

 allow column names to be prefixed by table alias in select all queries
 --

 Key: HIVE-3236
 URL: https://issues.apache.org/jira/browse/HIVE-3236
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
 Fix For: 0.10.0

 Attachments: HIVE-3236.1.patch.txt


 When using CREATE TABLE x AS SELECT ... where the select joins tables with 
 hundreds of columns it is not a simple task to resolve duplicate column name 
 exceptions (particularly with self-joins). The user must either manually 
 specify aliases for all duplicate columns (potentially hundreds) or write a 
 script to generate the data set in a separate select query, then create the 
 table and load the data in.
 There should be some conf flag that would allow queries like
 create table joined as select one.*, two.* from mytable one join mytable two 
 on (one.duplicate_field = two.duplicate_field1);
 to create a table with columns one_duplicate_field and two_duplicate_field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2012-07-05 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3236:


Fix Version/s: 0.10.0
   Status: Patch Available  (was: Open)

 allow column names to be prefixed by table alias in select all queries
 --

 Key: HIVE-3236
 URL: https://issues.apache.org/jira/browse/HIVE-3236
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
 Fix For: 0.10.0

 Attachments: HIVE-3236.1.patch.txt


 When using CREATE TABLE x AS SELECT ... where the select joins tables with 
 hundreds of columns it is not a simple task to resolve duplicate column name 
 exceptions (particularly with self-joins). The user must either manually 
 specify aliases for all duplicate columns (potentially hundreds) or write a 
 script to generate the data set in a separate select query, then create the 
 table and load the data in.
 There should be some conf flag that would allow queries like
 create table joined as select one.*, two.* from mytable one join mytable two 
 on (one.duplicate_field = two.duplicate_field1);
 to create a table with columns one_duplicate_field and two_duplicate_field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation

2012-06-06 Thread Keegan Mosley (JIRA)
Keegan Mosley created HIVE-3095:
---

 Summary: Self-referencing Avro schema creates infinite loop on 
table creation
 Key: HIVE-3095
 URL: https://issues.apache.org/jira/browse/HIVE-3095
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.1
Reporter: Keegan Mosley
Priority: Minor


An Avro schema which has a field reference to itself will create an infinite 
loop which eventually throws a StackOverflowError.

To reproduce using the linked-list example from 
http://avro.apache.org/docs/1.6.1/spec.html:

create table linkedListTest row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='
{
   type: record, 
   name: LongList,
   aliases: [LinkedLongs],  // old name for this
   fields : [
  {name: value, type: long}, // each element has a long
  {name: next, type: [LongList, null]} // optional next element
   ]
}
');

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation

2012-06-06 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3095:


Description: 
An Avro schema which has a field reference to itself will create an infinite 
loop which eventually throws a StackOverflowError.

To reproduce using the linked-list example from 
http://avro.apache.org/docs/1.6.1/spec.html

create table linkedListTest row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='
{
   type: record, 
   name: LongList,
   aliases: [LinkedLongs],  // old name for this
   fields : [
  {name: value, type: long}, // each element has a long
  {name: next, type: [LongList, null]} // optional next element
   ]
}
')
stored as inputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

  was:
An Avro schema which has a field reference to itself will create an infinite 
loop which eventually throws a StackOverflowError.

To reproduce using the linked-list example from 
http://avro.apache.org/docs/1.6.1/spec.html:

create table linkedListTest row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='
{
   type: record, 
   name: LongList,
   aliases: [LinkedLongs],  // old name for this
   fields : [
  {name: value, type: long}, // each element has a long
  {name: next, type: [LongList, null]} // optional next element
   ]
}
')
stored as inputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';


 Self-referencing Avro schema creates infinite loop on table creation
 

 Key: HIVE-3095
 URL: https://issues.apache.org/jira/browse/HIVE-3095
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.1
Reporter: Keegan Mosley
Priority: Minor
  Labels: avro

 An Avro schema which has a field reference to itself will create an infinite 
 loop which eventually throws a StackOverflowError.
 To reproduce using the linked-list example from 
 http://avro.apache.org/docs/1.6.1/spec.html
 create table linkedListTest row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 with serdeproperties ('avro.schema.literal'='
 {
type: record, 
name: LongList,
aliases: [LinkedLongs],  // old name for this
fields : [
   {name: value, type: long}, // each element has a 
 long
   {name: next, type: [LongList, null]} // optional next element
]
 }
 ')
 stored as inputformat 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3095) Self-referencing Avro schema creates infinite loop on table creation

2012-06-06 Thread Keegan Mosley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keegan Mosley updated HIVE-3095:


Description: 
An Avro schema which has a field reference to itself will create an infinite 
loop which eventually throws a StackOverflowError.

To reproduce using the linked-list example from 
http://avro.apache.org/docs/1.6.1/spec.html:

create table linkedListTest row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='
{
   type: record, 
   name: LongList,
   aliases: [LinkedLongs],  // old name for this
   fields : [
  {name: value, type: long}, // each element has a long
  {name: next, type: [LongList, null]} // optional next element
   ]
}
')
stored as inputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

  was:
An Avro schema which has a field reference to itself will create an infinite 
loop which eventually throws a StackOverflowError.

To reproduce using the linked-list example from 
http://avro.apache.org/docs/1.6.1/spec.html:

create table linkedListTest row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
with serdeproperties ('avro.schema.literal'='
{
   type: record, 
   name: LongList,
   aliases: [LinkedLongs],  // old name for this
   fields : [
  {name: value, type: long}, // each element has a long
  {name: next, type: [LongList, null]} // optional next element
   ]
}
');


 Self-referencing Avro schema creates infinite loop on table creation
 

 Key: HIVE-3095
 URL: https://issues.apache.org/jira/browse/HIVE-3095
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.1
Reporter: Keegan Mosley
Priority: Minor
  Labels: avro

 An Avro schema which has a field reference to itself will create an infinite 
 loop which eventually throws a StackOverflowError.
 To reproduce using the linked-list example from 
 http://avro.apache.org/docs/1.6.1/spec.html:
 create table linkedListTest row format serde 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 with serdeproperties ('avro.schema.literal'='
 {
type: record, 
name: LongList,
aliases: [LinkedLongs],  // old name for this
fields : [
   {name: value, type: long}, // each element has a 
 long
   {name: next, type: [LongList, null]} // optional next element
]
 }
 ')
 stored as inputformat 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira