[jira] [Created] (HIVE-12608) Parquet Schema Evolution doesn't work when a column is dropped from array>

2015-12-07 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-12608:


 Summary: Parquet Schema Evolution doesn't work when a column is 
dropped from array>
 Key: HIVE-12608
 URL: https://issues.apache.org/jira/browse/HIVE-12608
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


When a column is dropped from array>, I got the following exception.

I used the following sql to test it.

{quote}
CREATE TABLE arrays_of_struct_to_map (locations1 array>, 
locations2 array>) STORED AS PARQUET;
INSERT INTO TABLE arrays_of_struct_to_map select 
array(named_struct("c1",1,"c2",2)), array(named_struct("f1",
77,"f2",88,"f3",99)) FROM parquet_type_promotion LIMIT 1;
SELECT * FROM arrays_of_struct_to_map;
-- Testing schema evolution of dropping column from array>
ALTER TABLE arrays_of_struct_to_map REPLACE COLUMNS (locations1 
array>, locations2
array>);
SELECT * FROM arrays_of_struct_to_map;
{quote}

{quote}
2015-12-07 11:47:28,503 ERROR [main]: CliDriver 
(SessionState.java:printError(921)) - Failed with exception 
java.io.IOException:java.lang.RuntimeException: cannot find field c2 in [c1]
java.io.IOException: java.lang.RuntimeException: cannot find field c2 in [c1]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1029)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1003)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:139)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_type_promotion(TestCliDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.lang.RuntimeException: cannot find field c2 in [c1]
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.getStructFieldTypeInfo(HiveStructConverter.java:130)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.getFieldTypeIgnoreCase(HiveStructConverter.java:103)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.init(HiveStructConverter.java:90)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.(HiveStructConverter.java:67)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.(HiveStructConverter.java:59)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:63)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:75)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveCollectionConverter$ElementConverter.(HiveCollectionConverter.java:141)
at 
org.apache.hadoop.hive.ql.io.parquet.co

[jira] [Created] (HIVE-12475) Parquet schema evolution within array> doesn't work

2015-11-19 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-12475:


 Summary: Parquet schema evolution within array> doesn't 
work
 Key: HIVE-12475
 URL: https://issues.apache.org/jira/browse/HIVE-12475
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.1.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


If we create a table with type array>, and later added a field in the 
struct, we got the following exception.

The following SQL statements would recreate the error:

{quote}
CREATE TABLE pq_test (f1 array>) STORED AS  PARQUET;
INSERT INTO TABLE pq_test select array(named_struct("c1",1,"c2",2)) FROM tmp 
LIMIT 2;

SELECT * from pq_test;

ALTER TABLE pq_test REPLACE COLUMNS (f1 
array>); //* cc
SELECT * from pq_test;
{quote}

Exception:
{quote}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:142)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:363)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:316)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:199)
at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:236)
at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:89)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12080) Support auto type widening for Parquet table

2015-10-09 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-12080:


 Summary: Support auto type widening for Parquet table
 Key: HIVE-12080
 URL: https://issues.apache.org/jira/browse/HIVE-12080
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


Currently Hive+Parquet doesn't support it. It should include at least basic 
type promotions short->int->bigint,  float->double etc, that are already 
supported for  other file formats.

There were similar effort (Hive-6784) but was not committed. This JIRA is to 
address the same in different way with little (no) performance impact.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12018) beeline --help doesn't return to original prompt

2015-10-02 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-12018:


 Summary: beeline --help doesn't return to original prompt
 Key: HIVE-12018
 URL: https://issues.apache.org/jira/browse/HIVE-12018
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
Priority: Minor


"beeline --help"  displays the help message and returns to beeline prompt. The 
common pattern is to return to the unix prompt. The intention of any command 
help is to relaunch the same command with correct parameters.
One such output is :
{quote}
$ beeline --help
Usage: java org.apache.hive.cli.beeline.BeeLine 
   -uthe JDBC URL to connect to
   -nthe username to connect as
   -pthe password to connect as
.

Beeline version .. by Apache Hive
beeline> 
{quote}

The expected return prompt should be  "$" (the unix prompt).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10787) MatchPath misses the last matched row from the final result set

2015-05-21 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-10787:


 Summary: MatchPath misses the last matched row from the final 
result set
 Key: HIVE-10787
 URL: https://issues.apache.org/jira/browse/HIVE-10787
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


For example, if you have a STAR(*) pattern at the end, the current code misses 
the last row from the final result.  For example, if I have pattern like 
(LATE.EARLY*), the matched rows are :
1. LATE
2. EARLY
In the current implementation, the final 'tpath' missed the last "EARLY" and 
returns only LATE . Ideally it should return LATE and EARLY.

The following code snippets shows the bug.
{noformat}
0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr);
1. while (rowResult.matches && pItr.hasNext())
2.{
3.  row = pItr.next();
4.rowResult = symbolFn.match(row, pItr);
5.  }
6.
7.  result.nextRow = pItr.getIndex() - 1;
{noformat}

Line 7 of the code always moves the row index by one. If ,in some cases, loop 
(line 1)  is never executed (due to pItr.hasNext() being 'false'), the code 
still moves the row pointer back by one. Although the line 0 found the first 
match and the iterator reaches to the end.

I'm uploading a patch which I already tested.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-06-10 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026233#comment-14026233
 ] 

Mohammad Kamrul Islam commented on HIVE-6638:
-

agreed to [~ashutoshc]. Please go ahead. 

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-05-28 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011831#comment-14011831
 ] 

Mohammad Kamrul Islam commented on HIVE-6638:
-

Thanks [~ashutoshc] for the review.
In MR JIRA, we changed the behavior a little bit.
I will upload a patch soon to match with new MR behavior.



> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000642#comment-14000642
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Null is passed only if record schema is null but file schema is not null.
Do you see any use case for decimal too?

>Thus, we might need to fix in a different way.

Do you want me to fix it differently? or you are looking to address this for 
decimal differently?



> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999687#comment-13999687
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

[~xuefuz] : can you please help me to understand the problem mentioned in the 
previous comment?


> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999685#comment-13999685
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-


>HIVE-5823 was resolved as WONTFIX.

[~cwsteinbach] i see it was committed by [~brocknoland]. Is it possible we are 
looking into different JIRAs.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.10.patch

Rebasing with latest code.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-16 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-15 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998371#comment-13998371
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Thanks @xzhang.
>However, the fix in your patch seems having a problem with decimal, which may 
>need more deliberation.

What is the (potential) problem in decimal?
Any proposal what to do to address the decimal problem?



> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-14 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997378#comment-13997378
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

Thank [~xuefuz] for the comments.

I believe it is a valid Avro schema evolution.
Please see the following comments copied from  the link:
http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution
{noformat}
* if reader's is a union, but writer's is not
The first schema in the reader's union that matches the writer's schema is 
recursively resolved against it. If none match, an error is signalled.
* if writer's is a union, but reader's is not
If the reader's schema matches the selected writer's schema, it is recursively 
resolved against it. If they do not match, an error is signalled.
{noformat}

Moreover, i tested a similar scenarios using pure avro code where i wrote using 
schema "string" and read it using ["null","string"].

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-7049:


Attachment: HIVE-7049.1.patch

patch uploaded

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-13 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995897#comment-13995897
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

Recently committed HIVE-5823, added some bug.
I created a separate JIRA (HIVE-7049) to address this. Uploaded a patch for 
that.



> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5803) Support CTAS from a non-avro table to an avro table

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995899#comment-13995899
 ] 

Mohammad Kamrul Islam commented on HIVE-5803:
-

The dependent JIRA resolve the issue. Closing it.

> Support CTAS from a non-avro table to an avro table
> ---
>
> Key: HIVE-5803
> URL: https://issues.apache.org/jira/browse/HIVE-5803
> Project: Hive
>  Issue Type: Task
>Reporter: Mohammad Kamrul Islam
>Assignee: Carl Steinbach
>
> Hive currently does not work with HQL like :
> CREATE TABLE  as SELECT * from ;
> Actual it works successfully. But when I run "SELECT * from 
>  .." it fails.
> This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
> Findings so far: CTAS uses internal column names (in place of using the 
> column names provided in select) when crating the AVRO data file. In other 
> words, avro data file has column names in this form  of: _col0, _col1 where 
> as table column names are different.
> I tested with the following test cases and it failed:
> - verify 1) can create table using create table as select from non-avro table 
> 2) LOAD avro data into new table and read data from the new table
> CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
> DESCRIBE simple_kv_txt;
> LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
> SELECT * FROM simple_kv_txt ORDER BY KEY;
> CREATE TABLE copy_doctors ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key 
> as key, value as value FROM simple_kv_txt;
> DESCRIBE copy_doctors;
> SELECT * FROM copy_doctors;
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995896#comment-13995896
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

RB at: https://reviews.apache.org/r/21353/

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-5803) Support CTAS from a non-avro table to an avro table

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam resolved HIVE-5803.
-

Resolution: Won't Fix

> Support CTAS from a non-avro table to an avro table
> ---
>
> Key: HIVE-5803
> URL: https://issues.apache.org/jira/browse/HIVE-5803
> Project: Hive
>  Issue Type: Task
>Reporter: Mohammad Kamrul Islam
>Assignee: Carl Steinbach
>
> Hive currently does not work with HQL like :
> CREATE TABLE  as SELECT * from ;
> Actual it works successfully. But when I run "SELECT * from 
>  .." it fails.
> This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
> Findings so far: CTAS uses internal column names (in place of using the 
> column names provided in select) when crating the AVRO data file. In other 
> words, avro data file has column names in this form  of: _col0, _col1 where 
> as table column names are different.
> I tested with the following test cases and it failed:
> - verify 1) can create table using create table as select from non-avro table 
> 2) LOAD avro data into new table and read data from the new table
> CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
> DESCRIBE simple_kv_txt;
> LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
> SELECT * FROM simple_kv_txt ORDER BY KEY;
> CREATE TABLE copy_doctors ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key 
> as key, value as value FROM simple_kv_txt;
> DESCRIBE copy_doctors;
> SELECT * FROM copy_doctors;
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-7049:


Status: Patch Available  (was: Open)

> Unable to deserialize AVRO data when file schema and record schema are 
> different and nullable
> -
>
> Key: HIVE-7049
> URL: https://issues.apache.org/jira/browse/HIVE-7049
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-7049.1.patch
>
>
> It mainly happens when 
> 1 )file schema and record schema are not same
> 2 ) Record schema is nullable  but file schema is not.
> The potential code location is at class AvroDeserialize
>  
> {noformat}
>  if(AvroSerdeUtils.isNullableType(recordSchema)) {
>   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
> columnType);
> }
> {noformat}
> In the above code snippet, recordSchema is verified if it is nullable. But 
> the file schema is not checked.
> I tested with these values:
> {noformat}
> recordSchema= ["null","string"]
> fielSchema= "string"
> {noformat}
> And i got the following exception  mu debugged code version>.
> {noformat}
> org.apache.avro.AvroRuntimeException: Not a union: "string" 
> at org.apache.avro.Schema.getTypes(Schema.java:272)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
> at 
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
> at 
> org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-7049:
---

 Summary: Unable to deserialize AVRO data when file schema and 
record schema are different and nullable
 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


It mainly happens when 
1 )file schema and record schema are not same
2 ) Record schema is nullable  but file schema is not.

The potential code location is at class AvroDeserialize
 
{noformat}
 if(AvroSerdeUtils.isNullableType(recordSchema)) {
  return deserializeNullableUnion(datum, fileSchema, recordSchema, 
columnType);
}
{noformat}

In the above code snippet, recordSchema is verified if it is nullable. But the 
file schema is not checked.

I tested with these values:
{noformat}
recordSchema= ["null","string"]
fielSchema= "string"
{noformat}

And i got the following exception .

{noformat}
org.apache.avro.AvroRuntimeException: Not a union: "string" 
at org.apache.avro.Schema.getTypes(Schema.java:272)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
at 
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
at 
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)

{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Open  (was: Patch Available)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
> HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-06 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Affects Version/s: 0.12.0
   Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-06 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.9.patch

New patch that addressed [~cwsteinbach]'s review comments.

This patch addressed the following missing functions.
1. Create AVRO table from using HIVE schema ( w/o  specifying Avro Schema).
2. Copy AVRO table structure and data from an existing non-AVRO table using 
CTAS.
3. Copy AVRO table structure and data from an existing AVRO table using CTAS.

Note: We can close dependent JIRA HIVE-5803 that is no longer required. Another 
JIRA has already taken care of this.



> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.12.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-04-24 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980646#comment-13980646
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

planning to upload a new patch by next week.


> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-04-24 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980616#comment-13980616
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

yes.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-30 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6638:


Status: Patch Available  (was: Open)

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.11.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-30 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6638:


Attachment: HIVE-6638.2.patch

Uploaded in compliance with patch at MAPREDUCE-5812.

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-30 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6638:


Status: Open  (was: Patch Available)

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.11.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-26 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6638:


Status: Patch Available  (was: Open)

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.11.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-26 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948801#comment-13948801
 ] 

Mohammad Kamrul Islam commented on HIVE-6638:
-

In case, anyone is interested. The testing is an involved process and 
choreographed. I tested it as follows:

set mapred.map.tasks.speculative.execution=false;
set mapred.job.map.memory.mb=4096;
set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
create table load_overwrite (key string, value string) stored as textfile;
load data  local inpath '/tmp/data/' into table load_overwrite;
select key from load_overwrite where length(key) > 0 ;

Assuming /tmp/data has four copies of kv1.txt.

Tested against Hadoop 2.3 in single node Mac machine. The four tasks will run 
kind of sequentially.
Important:  When to kill MRAM? I killed the MRAM when the second one finished. 
It could be anytime before the last one finished. Command used: "jps |grep 
MRAppMaster |cut -d' ' -f1|xargs kill"


I was monitoring in two ways:
1. cd HADOOP_LOG_DIR/userlogs/ and ran "grep  -R "New Final Path" *". 
This will show what tasks are completed with file written to  HDFS.
2. run hadoop fs -lsr hdfs://localhost:9000/tmp/hive-/. It will show all 
the tasks' output during the execution. At the end , it is cleaned up.


Anyway, if you can kill MRAM during the execution, you should see there are 
only 4 output files . More importantly, you will see the completed (before MRAM 
was killed) task never rerun. Also you get the correct result.








> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-26 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6638:


Attachment: HIVE-6638.1.patch

Initial patch.

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6638.1.patch
>
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Status: Patch Available  (was: Open)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch, HIVE-6024.6.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.6.patch

Re-uploaded to be picked up by jenkins.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch, HIVE-6024.6.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Status: Open  (was: Patch Available)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: (was: HIVE-6024.5.patch)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-6638) Hive needs to implement recovery for Application Master restart

2014-03-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam reassigned HIVE-6638:
---

Assignee: Mohammad Kamrul Islam

> Hive needs to implement recovery for Application Master restart 
> 
>
> Key: HIVE-6638
> URL: https://issues.apache.org/jira/browse/HIVE-6638
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
>
> Currently, if AM restarts, whole job is restarted. Although, job and 
> subsequently query would still finish to completion, it would be nice if Hive 
> don't need to redo all the work done under previous AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.6.patch

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.5.patch, HIVE-6024.6.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.5.patch

replacing with the intended patch.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.5.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.5.patch

Addressed failed test cases.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch, HIVE-6024.5.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-03-07 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.4.patch

Addressed unit test failures.
Please check 3 .q.out changes. Those were required because this patch removes 
one extra copy phase.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, 
> HIVE-6024.4.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916614#comment-13916614
 ] 

Mohammad Kamrul Islam commented on HIVE-6024:
-

I didn't find any existing .q file that covered this. Made a comment in RB as 
well.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Status: Patch Available  (was: Open)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Status: Open  (was: Patch Available)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.3.patch

Updated with review comments. A new .q test is added.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-26 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.2.patch

Rebased

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5803) Support CTAS from a non-avro table to an avro table

2014-02-19 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906217#comment-13906217
 ] 

Mohammad Kamrul Islam commented on HIVE-5803:
-

The linked Jira might solve this problem as well.

> Support CTAS from a non-avro table to an avro table
> ---
>
> Key: HIVE-5803
> URL: https://issues.apache.org/jira/browse/HIVE-5803
> Project: Hive
>  Issue Type: Task
>Reporter: Mohammad Kamrul Islam
>Assignee: Carl Steinbach
>
> Hive currently does not work with HQL like :
> CREATE TABLE  as SELECT * from ;
> Actual it works successfully. But when I run "SELECT * from 
>  .." it fails.
> This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
> Findings so far: CTAS uses internal column names (in place of using the 
> column names provided in select) when crating the AVRO data file. In other 
> words, avro data file has column names in this form  of: _col0, _col1 where 
> as table column names are different.
> I tested with the following test cases and it failed:
> - verify 1) can create table using create table as select from non-avro table 
> 2) LOAD avro data into new table and read data from the new table
> CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
> DESCRIBE simple_kv_txt;
> LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
> SELECT * FROM simple_kv_txt ORDER BY KEY;
> CREATE TABLE copy_doctors ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key 
> as key, value as value FROM simple_kv_txt;
> DESCRIBE copy_doctors;
> SELECT * FROM copy_doctors;
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6375) Fix CTAS for parquet

2014-02-19 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906215#comment-13906215
 ] 

Mohammad Kamrul Islam commented on HIVE-6375:
-

+1 
reviewed the patch.

CTAS for aver doesn't work for the same reason (HIVE-5803).
Hopefully, the patch will help avro as well.

> Fix CTAS for parquet
> 
>
> Key: HIVE-6375
> URL: https://issues.apache.org/jira/browse/HIVE-6375
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Brock Noland
>Assignee: Szehon Ho
>Priority: Critical
>  Labels: Parquet
> Attachments: HIVE-6375.2.patch, HIVE-6375.patch
>
>
> More details here:
> https://github.com/Parquet/parquet-mr/issues/272



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Attachment: HIVE-6024.1.patch

Also updated in RB:
https://reviews.apache.org/r/18065/

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-13 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-6024:


Status: Patch Available  (was: Open)

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-07 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam reassigned HIVE-6024:
---

Assignee: Mohammad Kamrul Islam

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6327) A few mathematic functions don't take decimal input

2014-02-03 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889827#comment-13889827
 ] 

Mohammad Kamrul Islam commented on HIVE-6327:
-

Left few minor comments in RB.

> A few mathematic functions don't take decimal input
> ---
>
> Key: HIVE-6327
> URL: https://issues.apache.org/jira/browse/HIVE-6327
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6327.patch
>
>
> A few mathematical functions, such as sin() cos(), etc. don't take decimal as 
> argument.
> {code}
> hive> show tables;
> OK
> Time taken: 0.534 seconds
> hive> create table test(d decimal(5,2));
> OK
> Time taken: 0.351 seconds
> hive> select sin(d) from test;
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'd': No 
> matching method for class org.apache.hadoop.hive.ql.udf.UDFSin with 
> (decimal(5,2)). Possible choices: _FUNC_(double)  
> {code}
> HIVE-6246 covers only sign() function. The remaining ones, including sin, 
> cos, tan, asin, acos, atan, exp, ln, log, log10, log2, radians, and sqrt. 
> These are non-generic UDFs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-01-31 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888288#comment-13888288
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

Patch updated with review comments addressed at RB.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6246) Sign(a) UDF is not supported for decimal type

2014-01-23 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880483#comment-13880483
 ] 

Mohammad Kamrul Islam commented on HIVE-6246:
-

Left comments in RB.

> Sign(a) UDF is not supported for decimal type
> -
>
> Key: HIVE-6246
> URL: https://issues.apache.org/jira/browse/HIVE-6246
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6246.patch
>
>
> java.sql.SQLException: Error while compiling statement: FAILED: 
> SemanticException [Error 10014]: Line 1:86 Wrong arguments 'a': No matching 
> method for class org.apache.hadoop.hive.ql.udf.UDFSign with (decimal(38,10)). 
> Possible choices: _FUNC_(double)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6182) LDAP Authentication errors need to be more informative

2014-01-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869175#comment-13869175
 ] 

Mohammad Kamrul Islam commented on HIVE-6182:
-

So what is the plan for beeline exception?
No fix? or fix in different JIRA?

> LDAP Authentication errors need to be more informative
> --
>
> Key: HIVE-6182
> URL: https://issues.apache.org/jira/browse/HIVE-6182
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6182.patch
>
>
> There are a host of errors that can happen when logging into an LDAP-enabled 
> Hive-server2 from beeline.  But for any error there is only a generic log 
> message:
> {code}
> SASL negotiation failure
> javax.security.sasl.SaslException: PLAIN auth failed: Error validating LDAP 
> user
>   at 
> org.apache.hadoop.security.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrRespons
> {code}
> And on Beeline side there is only an even more unhelpful message:
> {code}
> Error: Invalid URL: jdbc:hive2://localhost:1/default (state=08S01,code=0)
> {code}
> It would be good to print out the underlying error message at least in the 
> log, if not beeline.   But today they are swallowed.  This is bad because the 
> underlying message is the most important, having the error codes as shown 
> here : [LDAP error 
> code|https://wiki.servicenow.com/index.php?title=LDAP_Error_Codes]
> The beeline seems to throw that exception for any error during connection, 
> authetication or otherwise.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6174) Beeline "set varible" doesn't show the value of the variable as Hive CLI

2014-01-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869172#comment-13869172
 ] 

Mohammad Kamrul Islam commented on HIVE-6174:
-

+1
Looks very straight forward.

> Beeline "set varible" doesn't show the value of the variable as Hive CLI
> 
>
> Key: HIVE-6174
> URL: https://issues.apache.org/jira/browse/HIVE-6174
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5174.3.patch, HIVE-6174.2.patch, HIVE-6174.patch
>
>
> Currently it displays nothing.
> {code}
> 0: jdbc:hive2://> set env:TERM; 
> 0: jdbc:hive2://> 
> {code}
> In contrast,  Hive CLI displays the value of the variable.
> {code}
> hive> set env:TERM; 
> env:TERM=xterm
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6185) DDLTask is inconsistent in creating a table and adding a partition when dealing with location

2014-01-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869170#comment-13869170
 ] 

Mohammad Kamrul Islam commented on HIVE-6185:
-

Patch looks good!
Few comments:
1. In Partition::setBucketCount(), 
FileSystem fs = FileSystem.get(getDataLocation().toUri(), Hive.get().getConf())
can be rewritten as (to make it consistent for other places):
FileSystem fs = getDataLocation().getFileSystem(Hive.get().getConf());

2. Same thing in SamplePruner:: limitPrune()
FileSystem fs = FileSystem.get(part.getDataLocation().toUri(), Hive.get() 
.getConf());
can be rewritten as 
FileSystem fs = part.getDataLocation().getFileSystem(Hive.get().getConf());

3. In Partition.java

A new method "public Path getDataLocation() " is introduced. Is it replacing 
"public Path getPartitionPath() " or  "final public URI getDataLocation()"? If 
it is the later one, do we need to keep the "final" modifier?
 

> DDLTask is inconsistent in creating a table and adding a partition when 
> dealing with location
> -
>
> Key: HIVE-6185
> URL: https://issues.apache.org/jira/browse/HIVE-6185
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6185.1.patch, HIVE-6185.2.patch, HIVE-6185.patch, 
> HIVE-6185.patch
>
>
> When creating a table, Hive uses URI to represent location:
> {code}
> if (crtTbl.getLocation() != null) {
>   tbl.setDataLocation(new Path(crtTbl.getLocation()).toUri());
> }
> {code}
> When adding a partition, Hive uses Path to represent location:
> {code}
>   // set partition path relative to table
>   db.createPartition(tbl, addPartitionDesc.getPartSpec(), new Path(tbl
> .getPath(), addPartitionDesc.getLocation()), 
> addPartitionDesc.getPartParams(),
> addPartitionDesc.getInputFormat(),
> addPartitionDesc.getOutputFormat(),
> addPartitionDesc.getNumBuckets(),
> addPartitionDesc.getCols(),
> addPartitionDesc.getSerializationLib(),
> addPartitionDesc.getSerdeParams(),
> addPartitionDesc.getBucketCols(),
> addPartitionDesc.getSortCols());
> {code}
> This disparity makes the values stored in metastore be encoded differently, 
> causing problems w.r.t. special character as demonstrated in HIVE-5446. As a 
> result, the code dealing with location for table is different for partition, 
> creating maintenance burden.
> We need to standardize it to Path to be in line with other Path related 
> cleanup effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6171) Use Paths consistently - V

2014-01-08 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866284#comment-13866284
 ] 

Mohammad Kamrul Islam commented on HIVE-6171:
-

+1 for the latest patch.

Minor comments:  method name could be  changed as well from 
"getExternalTmpFileURI" to "getExternalTmpPath" to be more specific.

> Use Paths consistently - V
> --
>
> Key: HIVE-6171
> URL: https://issues.apache.org/jira/browse/HIVE-6171
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6171.1.patch, HIVE-6171.patch
>
>
> Next in series for consistent usage of Paths in Hive.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-01-08 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866272#comment-13866272
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

[~cwsteinbach] can't reproduce it.
Uploaded a rebased version of patch.
 

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-01-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.7.patch

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-01-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159.7.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2014-01-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
> HIVE-5829.4.patch, tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2014-01-06 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.4.patch

 reviewer's comments addressed.

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
> HIVE-5829.4.patch, tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-12-24 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.7.patch

Included review comments

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch, HIVE-5731.6.patch, HIVE-5731.7.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5992) Hive inconsistently converts timestamp in AVG and SUM UDAF's

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852540#comment-13852540
 ] 

Mohammad Kamrul Islam commented on HIVE-5992:
-

Looks good.
+1

> Hive inconsistently converts timestamp in AVG and SUM UDAF's
> 
>
> Key: HIVE-5992
> URL: https://issues.apache.org/jira/browse/HIVE-5992
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5992.patch
>
>
> {code}
> hive> select t, sum(t), count(*), sum(t)/count(*), avg(t) from ts group by t;
> ...
> OK
> 1977-03-15 12:34:22.345678 227306062  1  227306062
> 2.27306062345678E8
> {code}
> As it can be seen, timestamp value (1977-03-15 12:34:22.345678) is converted 
> with fractional part ignored in sum, while preserved in avg. As a further 
> result, sum()/count() is not equivalent to avg.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.3.patch

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
> tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Status: Patch Available  (was: Open)

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
> tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: (was: HIVE-5829.3.patch)

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.6.patch

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-12-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
> HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-17 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.3.patch

Includes Carl's comments of moving the Test* file to correct location.

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
> tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-16 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.2.patch
tmp.HIVE-5829.patch

Addressed the failed test case and rebased with latest code base.

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-11-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Status: Patch Available  (was: Open)

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-11-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.1.patch

Also updated to RB: https://reviews.apache.org/r/15654/

> Rewrite Trim and Pad UDFs based on GenericUDF
> -
>
> Key: HIVE-5829
> URL: https://issues.apache.org/jira/browse/HIVE-5829
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5829.1.patch
>
>
> This JIRA includes following UDFs:
> 1. trim()
> 2. ltrim()
> 3. rtrim()
> 4. lpad()
> 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF

2013-11-14 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-5829:
---

 Summary: Rewrite Trim and Pad UDFs based on GenericUDF
 Key: HIVE-5829
 URL: https://issues.apache.org/jira/browse/HIVE-5829
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


This JIRA includes following UDFs:
1. trim()
2. ltrim()
3. rtrim()
4. lpad()
5. rpad()




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-11-14 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.5.patch

Rebasing with new mvn-based codebase.

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-14 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.6.patch

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch, HIVE-5731.6.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-12 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820736#comment-13820736
 ] 

Mohammad Kamrul Islam commented on HIVE-5731:
-

RB Updated.

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-11-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Affects Version/s: (was: 0.11.0)
   (was: 0.10.0)
   Status: Patch Available  (was: Open)

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-11-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.4.patch

> Update AvroSerde to determine schema of new tables
> --
>
> Key: HIVE-3159
> URL: https://issues.apache.org/jira/browse/HIVE-3159
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0, 0.11.0
>Reporter: Jakob Homan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-3159.4.patch, HIVE-3159v1.patch
>
>
> Currently when writing tables to Avro one must manually provide an Avro 
> schema that matches what is being delivered by Hive. It'd be better to have 
> the serde infer this schema by converting the table's TypeInfo into an 
> appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5803) Support CTAS from a non-avro table to an avro table

2013-11-12 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-5803:
---

 Summary: Support CTAS from a non-avro table to an avro table
 Key: HIVE-5803
 URL: https://issues.apache.org/jira/browse/HIVE-5803
 Project: Hive
  Issue Type: Task
Reporter: Mohammad Kamrul Islam


Hive currently does not work with HQL like :
CREATE TABLE  as SELECT * from ;
Actual it works successfully. But when I run "SELECT * from  
.." it fails.

This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
Findings so far: CTAS uses internal column names (in place of using the column 
names provided in select) when crating the AVRO data file. In other words, avro 
data file has column names in this form  of: _col0, _col1 where as table column 
names are different.

I tested with the following test cases and it failed:
- verify 1) can create table using create table as select from non-avro table 
2) LOAD avro data into new table and read data from the new table
CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
DESCRIBE simple_kv_txt;
LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
SELECT * FROM simple_kv_txt ORDER BY KEY;

CREATE TABLE copy_doctors ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key as 
key, value as value FROM simple_kv_txt;
DESCRIBE copy_doctors;

SELECT * FROM copy_doctors;




 




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HIVE-5803) Support CTAS from a non-avro table to an avro table

2013-11-12 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam reassigned HIVE-5803:
---

Assignee: Carl Steinbach

> Support CTAS from a non-avro table to an avro table
> ---
>
> Key: HIVE-5803
> URL: https://issues.apache.org/jira/browse/HIVE-5803
> Project: Hive
>  Issue Type: Task
>Reporter: Mohammad Kamrul Islam
>Assignee: Carl Steinbach
>
> Hive currently does not work with HQL like :
> CREATE TABLE  as SELECT * from ;
> Actual it works successfully. But when I run "SELECT * from 
>  .." it fails.
> This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
> Findings so far: CTAS uses internal column names (in place of using the 
> column names provided in select) when crating the AVRO data file. In other 
> words, avro data file has column names in this form  of: _col0, _col1 where 
> as table column names are different.
> I tested with the following test cases and it failed:
> - verify 1) can create table using create table as select from non-avro table 
> 2) LOAD avro data into new table and read data from the new table
> CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
> DESCRIBE simple_kv_txt;
> LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
> SELECT * FROM simple_kv_txt ORDER BY KEY;
> CREATE TABLE copy_doctors ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key 
> as key, value as value FROM simple_kv_txt;
> DESCRIBE copy_doctors;
> SELECT * FROM copy_doctors;
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819320#comment-13819320
 ] 

Mohammad Kamrul Islam commented on HIVE-5731:
-

[~appodictic] : I don't know if anyone has done any benchmarking to compare 
those.



> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.5.patch

Included Ashutosh's comment

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5790) maven test build failure shows wrong error message

2013-11-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5790:


Status: Patch Available  (was: Open)

> maven test build  failure shows wrong error message
> ---
>
> Key: HIVE-5790
> URL: https://issues.apache.org/jira/browse/HIVE-5790
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5790.1.patch
>
>
> This is the error message that was correct for ant.
> "See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
> more logs."
> This JIRA is to replace this message with mvn-specific error message.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.4.patch

Addressed build test case failure.

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5790) maven test build failure shows wrong error message

2013-11-08 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5790:


Attachment: HIVE-5790.1.patch

Initial patch.

> maven test build  failure shows wrong error message
> ---
>
> Key: HIVE-5790
> URL: https://issues.apache.org/jira/browse/HIVE-5790
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5790.1.patch
>
>
> This is the error message that was correct for ant.
> "See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
> more logs."
> This JIRA is to replace this message with mvn-specific error message.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5790) maven test build failure shows wrong error message

2013-11-08 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-5790:
---

 Summary: maven test build  failure shows wrong error message
 Key: HIVE-5790
 URL: https://issues.apache.org/jira/browse/HIVE-5790
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


This is the error message that was correct for ant.
"See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs."
This JIRA is to replace this message with mvn-specific error message.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5753) Remove collector from Operator base class

2013-11-06 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5753:


Status: Patch Available  (was: Open)

> Remove collector from Operator base class
> -
>
> Key: HIVE-5753
> URL: https://issues.apache.org/jira/browse/HIVE-5753
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5753.1.patch
>
>
> Collector is required for few operators. Managing this into base class is 
> overkill and bad design. This JIRA is to refactor the code pushing this to 
> where it is required.
> Background:
> https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5753) Remove collector from Operator base class

2013-11-06 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5753:


Attachment: HIVE-5753.1.patch

> Remove collector from Operator base class
> -
>
> Key: HIVE-5753
> URL: https://issues.apache.org/jira/browse/HIVE-5753
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5753.1.patch
>
>
> Collector is required for few operators. Managing this into base class is 
> overkill and bad design. This JIRA is to refactor the code pushing this to 
> where it is required.
> Background:
> https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5753) Remove collector from Operator base class

2013-11-06 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-5753:
---

 Summary: Remove collector from Operator base class
 Key: HIVE-5753
 URL: https://issues.apache.org/jira/browse/HIVE-5753
 Project: Hive
  Issue Type: Improvement
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


Collector is required for few operators. Managing this into base class is 
overkill and bad design. This JIRA is to refactor the code pushing this to 
where it is required.

Background:
https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-05 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.3.patch

Addressed the build error.

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY

2013-11-04 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5221:


Status: Patch Available  (was: Open)

> Issue in column type with data type as BINARY
> -
>
> Key: HIVE-5221
> URL: https://issues.apache.org/jira/browse/HIVE-5221
> Project: Hive
>  Issue Type: Bug
>Reporter: Arun Vasu
>Assignee: Mohammad Kamrul Islam
>Priority: Critical
> Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch
>
>
> Hi,
> I am using Hive 10. When I create an external table with column type as 
> Binary, the query result on the table is showing some junk values for the 
> column with binary datatype.
> Please find below the query I have used to create the table:
> CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
>  ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '^'
>LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hivetables/testbinary';
> The query I have used is : select * from bool1
> The sample data in the hdfs file is:
> 0^a...@abc.com^001
> 1^a...@abc.com^010
>  ^a...@abc.com^011
>  ^a...@abc.com^100
> t^a...@abc.com^101
> f^a...@abc.com^110
> true^a...@abc.com^111
> false^a...@abc.com^001
> 123^^01100010
> 12344^^0111
> Please share your inputs if it is possible.
> Thanks,
> Arun



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-04 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.2.patch

Review board:  https://reviews.apache.org/r/15213/

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-04 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Status: Patch Available  (was: Open)

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-04 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam reassigned HIVE-5731:
---

Assignee: Mohammad Kamrul Islam

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-04 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5731:


Attachment: HIVE-5731.1.patch

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-01 Thread Mohammad Kamrul Islam (JIRA)
Mohammad Kamrul Islam created HIVE-5731:
---

 Summary: Use new GenericUDF instead of basic UDF for UDFDate* 
classes 
 Key: HIVE-5731
 URL: https://issues.apache.org/jira/browse/HIVE-5731
 Project: Hive
  Issue Type: Improvement
Reporter: Mohammad Kamrul Islam


GenericUDF class is the latest and recommended base class for any UDFs.
This JIRA is to change the current UDFDate* classes extended from GenericUDF.

The general benefit of GenericUDF is described in comments as

"* The GenericUDF are superior to normal UDFs in the following ways: 1. It can

accept arguments of complex types, and return complex types. 2. It can 
accept
variable length of arguments. 3. It can accept an infinite number of 
function
signature - for example, it's easy to write a GenericUDF that accepts
array, array> and so on (arbitrary levels of nesting). 4. It
can do short-circuit evaluations using DeferedObject."





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY

2013-11-01 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5221:


Summary: Issue in column type with data type as BINARY  (was: Issue in 
colun type with data type as BINARY)

> Issue in column type with data type as BINARY
> -
>
> Key: HIVE-5221
> URL: https://issues.apache.org/jira/browse/HIVE-5221
> Project: Hive
>  Issue Type: Bug
>Reporter: Arun Vasu
>Assignee: Mohammad Kamrul Islam
>Priority: Critical
> Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch
>
>
> Hi,
> I am using Hive 10. When I create an external table with column type as 
> Binary, the query result on the table is showing some junk values for the 
> column with binary datatype.
> Please find below the query I have used to create the table:
> CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
>  ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '^'
>LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hivetables/testbinary';
> The query I have used is : select * from bool1
> The sample data in the hdfs file is:
> 0^a...@abc.com^001
> 1^a...@abc.com^010
>  ^a...@abc.com^011
>  ^a...@abc.com^100
> t^a...@abc.com^101
> f^a...@abc.com^110
> true^a...@abc.com^111
> false^a...@abc.com^001
> 123^^01100010
> 12344^^0111
> Please share your inputs if it is possible.
> Thanks,
> Arun



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY

2013-11-01 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5221:


Attachment: HIVE-5221.2.patch

Updated with Ashutosh's comment.

> Issue in column type with data type as BINARY
> -
>
> Key: HIVE-5221
> URL: https://issues.apache.org/jira/browse/HIVE-5221
> Project: Hive
>  Issue Type: Bug
>Reporter: Arun Vasu
>Assignee: Mohammad Kamrul Islam
>Priority: Critical
> Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch
>
>
> Hi,
> I am using Hive 10. When I create an external table with column type as 
> Binary, the query result on the table is showing some junk values for the 
> column with binary datatype.
> Please find below the query I have used to create the table:
> CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
>  ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '^'
>LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hivetables/testbinary';
> The query I have used is : select * from bool1
> The sample data in the hdfs file is:
> 0^a...@abc.com^001
> 1^a...@abc.com^010
>  ^a...@abc.com^011
>  ^a...@abc.com^100
> t^a...@abc.com^101
> f^a...@abc.com^110
> true^a...@abc.com^111
> false^a...@abc.com^001
> 123^^01100010
> 12344^^0111
> Please share your inputs if it is possible.
> Thanks,
> Arun



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >