[jira] [Created] (HIVE-12608) Parquet Schema Evolution doesn't work when a column is dropped from array>
Mohammad Kamrul Islam created HIVE-12608: Summary: Parquet Schema Evolution doesn't work when a column is dropped from array> Key: HIVE-12608 URL: https://issues.apache.org/jira/browse/HIVE-12608 Project: Hive Issue Type: Bug Components: File Formats Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam When a column is dropped from array>, I got the following exception. I used the following sql to test it. {quote} CREATE TABLE arrays_of_struct_to_map (locations1 array>, locations2 array>) STORED AS PARQUET; INSERT INTO TABLE arrays_of_struct_to_map select array(named_struct("c1",1,"c2",2)), array(named_struct("f1", 77,"f2",88,"f3",99)) FROM parquet_type_promotion LIMIT 1; SELECT * FROM arrays_of_struct_to_map; -- Testing schema evolution of dropping column from array> ALTER TABLE arrays_of_struct_to_map REPLACE COLUMNS (locations1 array>, locations2 array>); SELECT * FROM arrays_of_struct_to_map; {quote} {quote} 2015-12-07 11:47:28,503 ERROR [main]: CliDriver (SessionState.java:printError(921)) - Failed with exception java.io.IOException:java.lang.RuntimeException: cannot find field c2 in [c1] java.io.IOException: java.lang.RuntimeException: cannot find field c2 in [c1] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1029) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1003) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:139) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_type_promotion(TestCliDriver.java:123) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.lang.RuntimeException: cannot find field c2 in [c1] at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.getStructFieldTypeInfo(HiveStructConverter.java:130) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.getFieldTypeIgnoreCase(HiveStructConverter.java:103) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.init(HiveStructConverter.java:90) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.(HiveStructConverter.java:67) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveStructConverter.(HiveStructConverter.java:59) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:63) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:75) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveCollectionConverter$ElementConverter.(HiveCollectionConverter.java:141) at org.apache.hadoop.hive.ql.io.parquet.co
[jira] [Created] (HIVE-12475) Parquet schema evolution within array> doesn't work
Mohammad Kamrul Islam created HIVE-12475: Summary: Parquet schema evolution within array> doesn't work Key: HIVE-12475 URL: https://issues.apache.org/jira/browse/HIVE-12475 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.1.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam If we create a table with type array>, and later added a field in the struct, we got the following exception. The following SQL statements would recreate the error: {quote} CREATE TABLE pq_test (f1 array>) STORED AS PARQUET; INSERT INTO TABLE pq_test select array(named_struct("c1",1,"c2",2)) FROM tmp LIMIT 2; SELECT * from pq_test; ALTER TABLE pq_test REPLACE COLUMNS (f1 array>); //* cc SELECT * from pq_test; {quote} Exception: {quote} Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:142) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:363) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:316) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:199) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:236) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:89) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12080) Support auto type widening for Parquet table
Mohammad Kamrul Islam created HIVE-12080: Summary: Support auto type widening for Parquet table Key: HIVE-12080 URL: https://issues.apache.org/jira/browse/HIVE-12080 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Currently Hive+Parquet doesn't support it. It should include at least basic type promotions short->int->bigint, float->double etc, that are already supported for other file formats. There were similar effort (Hive-6784) but was not committed. This JIRA is to address the same in different way with little (no) performance impact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12018) beeline --help doesn't return to original prompt
Mohammad Kamrul Islam created HIVE-12018: Summary: beeline --help doesn't return to original prompt Key: HIVE-12018 URL: https://issues.apache.org/jira/browse/HIVE-12018 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Priority: Minor "beeline --help" displays the help message and returns to beeline prompt. The common pattern is to return to the unix prompt. The intention of any command help is to relaunch the same command with correct parameters. One such output is : {quote} $ beeline --help Usage: java org.apache.hive.cli.beeline.BeeLine -uthe JDBC URL to connect to -nthe username to connect as -pthe password to connect as . Beeline version .. by Apache Hive beeline> {quote} The expected return prompt should be "$" (the unix prompt). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10787) MatchPath misses the last matched row from the final result set
Mohammad Kamrul Islam created HIVE-10787: Summary: MatchPath misses the last matched row from the final result set Key: HIVE-10787 URL: https://issues.apache.org/jira/browse/HIVE-10787 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam For example, if you have a STAR(*) pattern at the end, the current code misses the last row from the final result. For example, if I have pattern like (LATE.EARLY*), the matched rows are : 1. LATE 2. EARLY In the current implementation, the final 'tpath' missed the last "EARLY" and returns only LATE . Ideally it should return LATE and EARLY. The following code snippets shows the bug. {noformat} 0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr); 1. while (rowResult.matches && pItr.hasNext()) 2.{ 3. row = pItr.next(); 4.rowResult = symbolFn.match(row, pItr); 5. } 6. 7. result.nextRow = pItr.getIndex() - 1; {noformat} Line 7 of the code always moves the row index by one. If ,in some cases, loop (line 1) is never executed (due to pItr.hasNext() being 'false'), the code still moves the row pointer back by one. Although the line 0 found the first match and the iterator reaches to the end. I'm uploading a patch which I already tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026233#comment-14026233 ] Mohammad Kamrul Islam commented on HIVE-6638: - agreed to [~ashutoshc]. Please go ahead. > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011831#comment-14011831 ] Mohammad Kamrul Islam commented on HIVE-6638: - Thanks [~ashutoshc] for the review. In MR JIRA, we changed the behavior a little bit. I will upload a patch soon to match with new MR behavior. > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000642#comment-14000642 ] Mohammad Kamrul Islam commented on HIVE-7049: - Null is passed only if record schema is null but file schema is not null. Do you see any use case for decimal too? >Thus, we might need to fix in a different way. Do you want me to fix it differently? or you are looking to address this for decimal differently? > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999687#comment-13999687 ] Mohammad Kamrul Islam commented on HIVE-7049: - [~xuefuz] : can you please help me to understand the problem mentioned in the previous comment? > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999685#comment-13999685 ] Mohammad Kamrul Islam commented on HIVE-3159: - >HIVE-5823 was resolved as WONTFIX. [~cwsteinbach] i see it was committed by [~brocknoland]. Is it possible we are looking into different JIRAs. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, > HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.10.patch Rebasing with latest code. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, > HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Patch Available (was: Open) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, > HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998371#comment-13998371 ] Mohammad Kamrul Islam commented on HIVE-7049: - Thanks @xzhang. >However, the fix in your patch seems having a problem with decimal, which may >need more deliberation. What is the (potential) problem in decimal? Any proposal what to do to address the decimal problem? > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997378#comment-13997378 ] Mohammad Kamrul Islam commented on HIVE-7049: - Thank [~xuefuz] for the comments. I believe it is a valid Avro schema evolution. Please see the following comments copied from the link: http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution {noformat} * if reader's is a union, but writer's is not The first schema in the reader's union that matches the writer's schema is recursively resolved against it. If none match, an error is signalled. * if writer's is a union, but reader's is not If the reader's schema matches the selected writer's schema, it is recursively resolved against it. If they do not match, an error is signalled. {noformat} Moreover, i tested a similar scenarios using pure avro code where i wrote using schema "string" and read it using ["null","string"]. > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-7049: Attachment: HIVE-7049.1.patch patch uploaded > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995897#comment-13995897 ] Mohammad Kamrul Islam commented on HIVE-3159: - Recently committed HIVE-5823, added some bug. I created a separate JIRA (HIVE-7049) to address this. Uploaded a patch for that. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, > HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5803) Support CTAS from a non-avro table to an avro table
[ https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995899#comment-13995899 ] Mohammad Kamrul Islam commented on HIVE-5803: - The dependent JIRA resolve the issue. Closing it. > Support CTAS from a non-avro table to an avro table > --- > > Key: HIVE-5803 > URL: https://issues.apache.org/jira/browse/HIVE-5803 > Project: Hive > Issue Type: Task >Reporter: Mohammad Kamrul Islam >Assignee: Carl Steinbach > > Hive currently does not work with HQL like : > CREATE TABLE as SELECT * from ; > Actual it works successfully. But when I run "SELECT * from > .." it fails. > This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. > Findings so far: CTAS uses internal column names (in place of using the > column names provided in select) when crating the AVRO data file. In other > words, avro data file has column names in this form of: _col0, _col1 where > as table column names are different. > I tested with the following test cases and it failed: > - verify 1) can create table using create table as select from non-avro table > 2) LOAD avro data into new table and read data from the new table > CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; > DESCRIBE simple_kv_txt; > LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; > SELECT * FROM simple_kv_txt ORDER BY KEY; > CREATE TABLE copy_doctors ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key > as key, value as value FROM simple_kv_txt; > DESCRIBE copy_doctors; > SELECT * FROM copy_doctors; > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995896#comment-13995896 ] Mohammad Kamrul Islam commented on HIVE-7049: - RB at: https://reviews.apache.org/r/21353/ > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-5803) Support CTAS from a non-avro table to an avro table
[ https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam resolved HIVE-5803. - Resolution: Won't Fix > Support CTAS from a non-avro table to an avro table > --- > > Key: HIVE-5803 > URL: https://issues.apache.org/jira/browse/HIVE-5803 > Project: Hive > Issue Type: Task >Reporter: Mohammad Kamrul Islam >Assignee: Carl Steinbach > > Hive currently does not work with HQL like : > CREATE TABLE as SELECT * from ; > Actual it works successfully. But when I run "SELECT * from > .." it fails. > This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. > Findings so far: CTAS uses internal column names (in place of using the > column names provided in select) when crating the AVRO data file. In other > words, avro data file has column names in this form of: _col0, _col1 where > as table column names are different. > I tested with the following test cases and it failed: > - verify 1) can create table using create table as select from non-avro table > 2) LOAD avro data into new table and read data from the new table > CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; > DESCRIBE simple_kv_txt; > LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; > SELECT * FROM simple_kv_txt ORDER BY KEY; > CREATE TABLE copy_doctors ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key > as key, value as value FROM simple_kv_txt; > DESCRIBE copy_doctors; > SELECT * FROM copy_doctors; > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-7049: Status: Patch Available (was: Open) > Unable to deserialize AVRO data when file schema and record schema are > different and nullable > - > > Key: HIVE-7049 > URL: https://issues.apache.org/jira/browse/HIVE-7049 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-7049.1.patch > > > It mainly happens when > 1 )file schema and record schema are not same > 2 ) Record schema is nullable but file schema is not. > The potential code location is at class AvroDeserialize > > {noformat} > if(AvroSerdeUtils.isNullableType(recordSchema)) { > return deserializeNullableUnion(datum, fileSchema, recordSchema, > columnType); > } > {noformat} > In the above code snippet, recordSchema is verified if it is nullable. But > the file schema is not checked. > I tested with these values: > {noformat} > recordSchema= ["null","string"] > fielSchema= "string" > {noformat} > And i got the following exception mu debugged code version>. > {noformat} > org.apache.avro.AvroRuntimeException: Not a union: "string" > at org.apache.avro.Schema.getTypes(Schema.java:272) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) > at > org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) > at > org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
Mohammad Kamrul Islam created HIVE-7049: --- Summary: Unable to deserialize AVRO data when file schema and record schema are different and nullable Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= ["null","string"] fielSchema= "string" {noformat} And i got the following exception . {noformat} org.apache.avro.AvroRuntimeException: Not a union: "string" at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Open (was: Patch Available) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, > HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Affects Version/s: 0.12.0 Status: Patch Available (was: Open) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.9.patch New patch that addressed [~cwsteinbach]'s review comments. This patch addressed the following missing functions. 1. Create AVRO table from using HIVE schema ( w/o specifying Avro Schema). 2. Copy AVRO table structure and data from an existing non-AVRO table using CTAS. 3. Copy AVRO table structure and data from an existing AVRO table using CTAS. Note: We can close dependent JIRA HIVE-5803 that is no longer required. Another JIRA has already taken care of this. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.12.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980646#comment-13980646 ] Mohammad Kamrul Islam commented on HIVE-3159: - planning to upload a new patch by next week. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980616#comment-13980616 ] Mohammad Kamrul Islam commented on HIVE-3159: - yes. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6638: Status: Patch Available (was: Open) > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.12.0, 0.11.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6638: Attachment: HIVE-6638.2.patch Uploaded in compliance with patch at MAPREDUCE-5812. > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6638: Status: Open (was: Patch Available) > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.12.0, 0.11.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch, HIVE-6638.2.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6638: Status: Patch Available (was: Open) > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.12.0, 0.11.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948801#comment-13948801 ] Mohammad Kamrul Islam commented on HIVE-6638: - In case, anyone is interested. The testing is an involved process and choreographed. I tested it as follows: set mapred.map.tasks.speculative.execution=false; set mapred.job.map.memory.mb=4096; set hive.merge.mapfiles=false; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; create table load_overwrite (key string, value string) stored as textfile; load data local inpath '/tmp/data/' into table load_overwrite; select key from load_overwrite where length(key) > 0 ; Assuming /tmp/data has four copies of kv1.txt. Tested against Hadoop 2.3 in single node Mac machine. The four tasks will run kind of sequentially. Important: When to kill MRAM? I killed the MRAM when the second one finished. It could be anytime before the last one finished. Command used: "jps |grep MRAppMaster |cut -d' ' -f1|xargs kill" I was monitoring in two ways: 1. cd HADOOP_LOG_DIR/userlogs/ and ran "grep -R "New Final Path" *". This will show what tasks are completed with file written to HDFS. 2. run hadoop fs -lsr hdfs://localhost:9000/tmp/hive-/. It will show all the tasks' output during the execution. At the end , it is cleaned up. Anyway, if you can kill MRAM during the execution, you should see there are only 4 output files . More importantly, you will see the completed (before MRAM was killed) task never rerun. Also you get the correct result. > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6638: Attachment: HIVE-6638.1.patch Initial patch. > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6638.1.patch > > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Status: Patch Available (was: Open) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch, HIVE-6024.6.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.6.patch Re-uploaded to be picked up by jenkins. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch, HIVE-6024.6.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Status: Open (was: Patch Available) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: (was: HIVE-6024.5.patch) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.6.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-6638) Hive needs to implement recovery for Application Master restart
[ https://issues.apache.org/jira/browse/HIVE-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam reassigned HIVE-6638: --- Assignee: Mohammad Kamrul Islam > Hive needs to implement recovery for Application Master restart > > > Key: HIVE-6638 > URL: https://issues.apache.org/jira/browse/HIVE-6638 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.11.0, 0.12.0, 0.13.0 >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > > Currently, if AM restarts, whole job is restarted. Although, job and > subsequently query would still finish to completion, it would be nice if Hive > don't need to redo all the work done under previous AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.6.patch > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.5.patch, HIVE-6024.6.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.5.patch replacing with the intended patch. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch, HIVE-6024.5.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.5.patch Addressed failed test cases. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch, HIVE-6024.5.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.4.patch Addressed unit test failures. Please check 3 .q.out changes. Those were required because this patch removes one extra copy phase. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch, > HIVE-6024.4.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916614#comment-13916614 ] Mohammad Kamrul Islam commented on HIVE-6024: - I didn't find any existing .q file that covered this. Made a comment in RB as well. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Status: Patch Available (was: Open) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Status: Open (was: Patch Available) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.3.patch Updated with review comments. A new .q test is added. > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.2.patch Rebased > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5803) Support CTAS from a non-avro table to an avro table
[ https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906217#comment-13906217 ] Mohammad Kamrul Islam commented on HIVE-5803: - The linked Jira might solve this problem as well. > Support CTAS from a non-avro table to an avro table > --- > > Key: HIVE-5803 > URL: https://issues.apache.org/jira/browse/HIVE-5803 > Project: Hive > Issue Type: Task >Reporter: Mohammad Kamrul Islam >Assignee: Carl Steinbach > > Hive currently does not work with HQL like : > CREATE TABLE as SELECT * from ; > Actual it works successfully. But when I run "SELECT * from > .." it fails. > This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. > Findings so far: CTAS uses internal column names (in place of using the > column names provided in select) when crating the AVRO data file. In other > words, avro data file has column names in this form of: _col0, _col1 where > as table column names are different. > I tested with the following test cases and it failed: > - verify 1) can create table using create table as select from non-avro table > 2) LOAD avro data into new table and read data from the new table > CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; > DESCRIBE simple_kv_txt; > LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; > SELECT * FROM simple_kv_txt ORDER BY KEY; > CREATE TABLE copy_doctors ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key > as key, value as value FROM simple_kv_txt; > DESCRIBE copy_doctors; > SELECT * FROM copy_doctors; > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6375) Fix CTAS for parquet
[ https://issues.apache.org/jira/browse/HIVE-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906215#comment-13906215 ] Mohammad Kamrul Islam commented on HIVE-6375: - +1 reviewed the patch. CTAS for aver doesn't work for the same reason (HIVE-5803). Hopefully, the patch will help avro as well. > Fix CTAS for parquet > > > Key: HIVE-6375 > URL: https://issues.apache.org/jira/browse/HIVE-6375 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Brock Noland >Assignee: Szehon Ho >Priority: Critical > Labels: Parquet > Attachments: HIVE-6375.2.patch, HIVE-6375.patch > > > More details here: > https://github.com/Parquet/parquet-mr/issues/272 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Attachment: HIVE-6024.1.patch Also updated in RB: https://reviews.apache.org/r/18065/ > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-6024: Status: Patch Available (was: Open) > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-6024.1.patch > > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HIVE-6024) Load data local inpath unnecessarily creates a copy task
[ https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam reassigned HIVE-6024: --- Assignee: Mohammad Kamrul Islam > Load data local inpath unnecessarily creates a copy task > > > Key: HIVE-6024 > URL: https://issues.apache.org/jira/browse/HIVE-6024 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Ashutosh Chauhan >Assignee: Mohammad Kamrul Islam > > Load data command creates an additional copy task only when its loading from > {{local}} It doesn't create this additional copy task while loading from DFS > though. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6327) A few mathematic functions don't take decimal input
[ https://issues.apache.org/jira/browse/HIVE-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889827#comment-13889827 ] Mohammad Kamrul Islam commented on HIVE-6327: - Left few minor comments in RB. > A few mathematic functions don't take decimal input > --- > > Key: HIVE-6327 > URL: https://issues.apache.org/jira/browse/HIVE-6327 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.11.0, 0.12.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-6327.patch > > > A few mathematical functions, such as sin() cos(), etc. don't take decimal as > argument. > {code} > hive> show tables; > OK > Time taken: 0.534 seconds > hive> create table test(d decimal(5,2)); > OK > Time taken: 0.351 seconds > hive> select sin(d) from test; > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'd': No > matching method for class org.apache.hadoop.hive.ql.udf.UDFSin with > (decimal(5,2)). Possible choices: _FUNC_(double) > {code} > HIVE-6246 covers only sign() function. The remaining ones, including sin, > cos, tan, asin, acos, atan, exp, ln, log, log10, log2, radians, and sqrt. > These are non-generic UDFs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888288#comment-13888288 ] Mohammad Kamrul Islam commented on HIVE-3159: - Patch updated with review comments addressed at RB. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6246) Sign(a) UDF is not supported for decimal type
[ https://issues.apache.org/jira/browse/HIVE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880483#comment-13880483 ] Mohammad Kamrul Islam commented on HIVE-6246: - Left comments in RB. > Sign(a) UDF is not supported for decimal type > - > > Key: HIVE-6246 > URL: https://issues.apache.org/jira/browse/HIVE-6246 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 0.12.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-6246.patch > > > java.sql.SQLException: Error while compiling statement: FAILED: > SemanticException [Error 10014]: Line 1:86 Wrong arguments 'a': No matching > method for class org.apache.hadoop.hive.ql.udf.UDFSign with (decimal(38,10)). > Possible choices: _FUNC_(double) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6182) LDAP Authentication errors need to be more informative
[ https://issues.apache.org/jira/browse/HIVE-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869175#comment-13869175 ] Mohammad Kamrul Islam commented on HIVE-6182: - So what is the plan for beeline exception? No fix? or fix in different JIRA? > LDAP Authentication errors need to be more informative > -- > > Key: HIVE-6182 > URL: https://issues.apache.org/jira/browse/HIVE-6182 > Project: Hive > Issue Type: Improvement > Components: Authentication >Affects Versions: 0.13.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-6182.patch > > > There are a host of errors that can happen when logging into an LDAP-enabled > Hive-server2 from beeline. But for any error there is only a generic log > message: > {code} > SASL negotiation failure > javax.security.sasl.SaslException: PLAIN auth failed: Error validating LDAP > user > at > org.apache.hadoop.security.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108) > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrRespons > {code} > And on Beeline side there is only an even more unhelpful message: > {code} > Error: Invalid URL: jdbc:hive2://localhost:1/default (state=08S01,code=0) > {code} > It would be good to print out the underlying error message at least in the > log, if not beeline. But today they are swallowed. This is bad because the > underlying message is the most important, having the error codes as shown > here : [LDAP error > code|https://wiki.servicenow.com/index.php?title=LDAP_Error_Codes] > The beeline seems to throw that exception for any error during connection, > authetication or otherwise. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6174) Beeline "set varible" doesn't show the value of the variable as Hive CLI
[ https://issues.apache.org/jira/browse/HIVE-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869172#comment-13869172 ] Mohammad Kamrul Islam commented on HIVE-6174: - +1 Looks very straight forward. > Beeline "set varible" doesn't show the value of the variable as Hive CLI > > > Key: HIVE-6174 > URL: https://issues.apache.org/jira/browse/HIVE-6174 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.10.0, 0.11.0, 0.12.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-5174.3.patch, HIVE-6174.2.patch, HIVE-6174.patch > > > Currently it displays nothing. > {code} > 0: jdbc:hive2://> set env:TERM; > 0: jdbc:hive2://> > {code} > In contrast, Hive CLI displays the value of the variable. > {code} > hive> set env:TERM; > env:TERM=xterm > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6185) DDLTask is inconsistent in creating a table and adding a partition when dealing with location
[ https://issues.apache.org/jira/browse/HIVE-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869170#comment-13869170 ] Mohammad Kamrul Islam commented on HIVE-6185: - Patch looks good! Few comments: 1. In Partition::setBucketCount(), FileSystem fs = FileSystem.get(getDataLocation().toUri(), Hive.get().getConf()) can be rewritten as (to make it consistent for other places): FileSystem fs = getDataLocation().getFileSystem(Hive.get().getConf()); 2. Same thing in SamplePruner:: limitPrune() FileSystem fs = FileSystem.get(part.getDataLocation().toUri(), Hive.get() .getConf()); can be rewritten as FileSystem fs = part.getDataLocation().getFileSystem(Hive.get().getConf()); 3. In Partition.java A new method "public Path getDataLocation() " is introduced. Is it replacing "public Path getPartitionPath() " or "final public URI getDataLocation()"? If it is the later one, do we need to keep the "final" modifier? > DDLTask is inconsistent in creating a table and adding a partition when > dealing with location > - > > Key: HIVE-6185 > URL: https://issues.apache.org/jira/browse/HIVE-6185 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.12.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-6185.1.patch, HIVE-6185.2.patch, HIVE-6185.patch, > HIVE-6185.patch > > > When creating a table, Hive uses URI to represent location: > {code} > if (crtTbl.getLocation() != null) { > tbl.setDataLocation(new Path(crtTbl.getLocation()).toUri()); > } > {code} > When adding a partition, Hive uses Path to represent location: > {code} > // set partition path relative to table > db.createPartition(tbl, addPartitionDesc.getPartSpec(), new Path(tbl > .getPath(), addPartitionDesc.getLocation()), > addPartitionDesc.getPartParams(), > addPartitionDesc.getInputFormat(), > addPartitionDesc.getOutputFormat(), > addPartitionDesc.getNumBuckets(), > addPartitionDesc.getCols(), > addPartitionDesc.getSerializationLib(), > addPartitionDesc.getSerdeParams(), > addPartitionDesc.getBucketCols(), > addPartitionDesc.getSortCols()); > {code} > This disparity makes the values stored in metastore be encoded differently, > causing problems w.r.t. special character as demonstrated in HIVE-5446. As a > result, the code dealing with location for table is different for partition, > creating maintenance burden. > We need to standardize it to Path to be in line with other Path related > cleanup effort. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6171) Use Paths consistently - V
[ https://issues.apache.org/jira/browse/HIVE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866284#comment-13866284 ] Mohammad Kamrul Islam commented on HIVE-6171: - +1 for the latest patch. Minor comments: method name could be changed as well from "getExternalTmpFileURI" to "getExternalTmpPath" to be more specific. > Use Paths consistently - V > -- > > Key: HIVE-6171 > URL: https://issues.apache.org/jira/browse/HIVE-6171 > Project: Hive > Issue Type: Improvement >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-6171.1.patch, HIVE-6171.patch > > > Next in series for consistent usage of Paths in Hive. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866272#comment-13866272 ] Mohammad Kamrul Islam commented on HIVE-3159: - [~cwsteinbach] can't reproduce it. Uploaded a rebased version of patch. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.7.patch > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Patch Available (was: Open) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159.7.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Resolution: Fixed Status: Resolved (was: Patch Available) > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, > HIVE-5829.4.patch, tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.4.patch reviewer's comments addressed. > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, > HIVE-5829.4.patch, tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.7.patch Included review comments > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch, HIVE-5731.5.patch, HIVE-5731.6.patch, HIVE-5731.7.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5992) Hive inconsistently converts timestamp in AVG and SUM UDAF's
[ https://issues.apache.org/jira/browse/HIVE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852540#comment-13852540 ] Mohammad Kamrul Islam commented on HIVE-5992: - Looks good. +1 > Hive inconsistently converts timestamp in AVG and SUM UDAF's > > > Key: HIVE-5992 > URL: https://issues.apache.org/jira/browse/HIVE-5992 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 0.12.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-5992.patch > > > {code} > hive> select t, sum(t), count(*), sum(t)/count(*), avg(t) from ts group by t; > ... > OK > 1977-03-15 12:34:22.345678 227306062 1 227306062 > 2.27306062345678E8 > {code} > As it can be seen, timestamp value (1977-03-15 12:34:22.345678) is converted > with fractional part ignored in sum, while preserved in avg. As a further > result, sum()/count() is not equivalent to avg. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.3.patch > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, > tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Status: Patch Available (was: Open) > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, > tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: (was: HIVE-5829.3.patch) > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.6.patch > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Patch Available (was: Open) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, > HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.3.patch Includes Carl's comments of moving the Test* file to correct location. > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, > tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.2.patch tmp.HIVE-5829.patch Addressed the failed test case and rebased with latest code base. > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Status: Patch Available (was: Open) > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.1.patch Also updated to RB: https://reviews.apache.org/r/15654/ > Rewrite Trim and Pad UDFs based on GenericUDF > - > > Key: HIVE-5829 > URL: https://issues.apache.org/jira/browse/HIVE-5829 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5829.1.patch > > > This JIRA includes following UDFs: > 1. trim() > 2. ltrim() > 3. rtrim() > 4. lpad() > 5. rpad() -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
Mohammad Kamrul Islam created HIVE-5829: --- Summary: Rewrite Trim and Pad UDFs based on GenericUDF Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.5.patch Rebasing with new mvn-based codebase. > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.6.patch > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch, HIVE-5731.5.patch, HIVE-5731.6.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820736#comment-13820736 ] Mohammad Kamrul Islam commented on HIVE-5731: - RB Updated. > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch, HIVE-5731.5.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Affects Version/s: (was: 0.11.0) (was: 0.10.0) Status: Patch Available (was: Open) > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.4.patch > Update AvroSerde to determine schema of new tables > -- > > Key: HIVE-3159 > URL: https://issues.apache.org/jira/browse/HIVE-3159 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Affects Versions: 0.10.0, 0.11.0 >Reporter: Jakob Homan >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-3159.4.patch, HIVE-3159v1.patch > > > Currently when writing tables to Avro one must manually provide an Avro > schema that matches what is being delivered by Hive. It'd be better to have > the serde infer this schema by converting the table's TypeInfo into an > appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5803) Support CTAS from a non-avro table to an avro table
Mohammad Kamrul Islam created HIVE-5803: --- Summary: Support CTAS from a non-avro table to an avro table Key: HIVE-5803 URL: https://issues.apache.org/jira/browse/HIVE-5803 Project: Hive Issue Type: Task Reporter: Mohammad Kamrul Islam Hive currently does not work with HQL like : CREATE TABLE as SELECT * from ; Actual it works successfully. But when I run "SELECT * from .." it fails. This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. Findings so far: CTAS uses internal column names (in place of using the column names provided in select) when crating the AVRO data file. In other words, avro data file has column names in this form of: _col0, _col1 where as table column names are different. I tested with the following test cases and it failed: - verify 1) can create table using create table as select from non-avro table 2) LOAD avro data into new table and read data from the new table CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; DESCRIBE simple_kv_txt; LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; SELECT * FROM simple_kv_txt ORDER BY KEY; CREATE TABLE copy_doctors ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key as key, value as value FROM simple_kv_txt; DESCRIBE copy_doctors; SELECT * FROM copy_doctors; -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HIVE-5803) Support CTAS from a non-avro table to an avro table
[ https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam reassigned HIVE-5803: --- Assignee: Carl Steinbach > Support CTAS from a non-avro table to an avro table > --- > > Key: HIVE-5803 > URL: https://issues.apache.org/jira/browse/HIVE-5803 > Project: Hive > Issue Type: Task >Reporter: Mohammad Kamrul Islam >Assignee: Carl Steinbach > > Hive currently does not work with HQL like : > CREATE TABLE as SELECT * from ; > Actual it works successfully. But when I run "SELECT * from > .." it fails. > This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema. > Findings so far: CTAS uses internal column names (in place of using the > column names provided in select) when crating the AVRO data file. In other > words, avro data file has column names in this form of: _col0, _col1 where > as table column names are different. > I tested with the following test cases and it failed: > - verify 1) can create table using create table as select from non-avro table > 2) LOAD avro data into new table and read data from the new table > CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE; > DESCRIBE simple_kv_txt; > LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt; > SELECT * FROM simple_kv_txt ORDER BY KEY; > CREATE TABLE copy_doctors ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key > as key, value as value FROM simple_kv_txt; > DESCRIBE copy_doctors; > SELECT * FROM copy_doctors; > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819320#comment-13819320 ] Mohammad Kamrul Islam commented on HIVE-5731: - [~appodictic] : I don't know if anyone has done any benchmarking to compare those. > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch, HIVE-5731.5.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.5.patch Included Ashutosh's comment > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch, HIVE-5731.5.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5790) maven test build failure shows wrong error message
[ https://issues.apache.org/jira/browse/HIVE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5790: Status: Patch Available (was: Open) > maven test build failure shows wrong error message > --- > > Key: HIVE-5790 > URL: https://issues.apache.org/jira/browse/HIVE-5790 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5790.1.patch > > > This is the error message that was correct for ant. > "See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get > more logs." > This JIRA is to replace this message with mvn-specific error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.4.patch Addressed build test case failure. > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, > HIVE-5731.4.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5790) maven test build failure shows wrong error message
[ https://issues.apache.org/jira/browse/HIVE-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5790: Attachment: HIVE-5790.1.patch Initial patch. > maven test build failure shows wrong error message > --- > > Key: HIVE-5790 > URL: https://issues.apache.org/jira/browse/HIVE-5790 > Project: Hive > Issue Type: Bug >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5790.1.patch > > > This is the error message that was correct for ant. > "See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get > more logs." > This JIRA is to replace this message with mvn-specific error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5790) maven test build failure shows wrong error message
Mohammad Kamrul Islam created HIVE-5790: --- Summary: maven test build failure shows wrong error message Key: HIVE-5790 URL: https://issues.apache.org/jira/browse/HIVE-5790 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam This is the error message that was correct for ant. "See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get more logs." This JIRA is to replace this message with mvn-specific error message. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5753) Remove collector from Operator base class
[ https://issues.apache.org/jira/browse/HIVE-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5753: Status: Patch Available (was: Open) > Remove collector from Operator base class > - > > Key: HIVE-5753 > URL: https://issues.apache.org/jira/browse/HIVE-5753 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5753.1.patch > > > Collector is required for few operators. Managing this into base class is > overkill and bad design. This JIRA is to refactor the code pushing this to > where it is required. > Background: > https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5753) Remove collector from Operator base class
[ https://issues.apache.org/jira/browse/HIVE-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5753: Attachment: HIVE-5753.1.patch > Remove collector from Operator base class > - > > Key: HIVE-5753 > URL: https://issues.apache.org/jira/browse/HIVE-5753 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5753.1.patch > > > Collector is required for few operators. Managing this into base class is > overkill and bad design. This JIRA is to refactor the code pushing this to > where it is required. > Background: > https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5753) Remove collector from Operator base class
Mohammad Kamrul Islam created HIVE-5753: --- Summary: Remove collector from Operator base class Key: HIVE-5753 URL: https://issues.apache.org/jira/browse/HIVE-5753 Project: Hive Issue Type: Improvement Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Collector is required for few operators. Managing this into base class is overkill and bad design. This JIRA is to refactor the code pushing this to where it is required. Background: https://issues.apache.org/jira/browse/HIVE-5345?focusedCommentId=13775665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13775665 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.3.patch Addressed the build error. > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY
[ https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5221: Status: Patch Available (was: Open) > Issue in column type with data type as BINARY > - > > Key: HIVE-5221 > URL: https://issues.apache.org/jira/browse/HIVE-5221 > Project: Hive > Issue Type: Bug >Reporter: Arun Vasu >Assignee: Mohammad Kamrul Islam >Priority: Critical > Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch > > > Hi, > I am using Hive 10. When I create an external table with column type as > Binary, the query result on the table is showing some junk values for the > column with binary datatype. > Please find below the query I have used to create the table: > CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) > ROW FORMAT DELIMITED >FIELDS TERMINATED BY '^' >LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/hivetables/testbinary'; > The query I have used is : select * from bool1 > The sample data in the hdfs file is: > 0^a...@abc.com^001 > 1^a...@abc.com^010 > ^a...@abc.com^011 > ^a...@abc.com^100 > t^a...@abc.com^101 > f^a...@abc.com^110 > true^a...@abc.com^111 > false^a...@abc.com^001 > 123^^01100010 > 12344^^0111 > Please share your inputs if it is possible. > Thanks, > Arun -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.2.patch Review board: https://reviews.apache.org/r/15213/ > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Status: Patch Available (was: Open) > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam reassigned HIVE-5731: --- Assignee: Mohammad Kamrul Islam > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
[ https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5731: Attachment: HIVE-5731.1.patch > Use new GenericUDF instead of basic UDF for UDFDate* classes > - > > Key: HIVE-5731 > URL: https://issues.apache.org/jira/browse/HIVE-5731 > Project: Hive > Issue Type: Improvement >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: HIVE-5731.1.patch > > > GenericUDF class is the latest and recommended base class for any UDFs. > This JIRA is to change the current UDFDate* classes extended from GenericUDF. > The general benefit of GenericUDF is described in comments as > "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can > accept arguments of complex types, and return complex types. 2. It can > accept > variable length of arguments. 3. It can accept an infinite number of > function > signature - for example, it's easy to write a GenericUDF that accepts > array, array> and so on (arbitrary levels of nesting). 4. > It > can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes
Mohammad Kamrul Islam created HIVE-5731: --- Summary: Use new GenericUDF instead of basic UDF for UDFDate* classes Key: HIVE-5731 URL: https://issues.apache.org/jira/browse/HIVE-5731 Project: Hive Issue Type: Improvement Reporter: Mohammad Kamrul Islam GenericUDF class is the latest and recommended base class for any UDFs. This JIRA is to change the current UDFDate* classes extended from GenericUDF. The general benefit of GenericUDF is described in comments as "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can accept arguments of complex types, and return complex types. 2. It can accept variable length of arguments. 3. It can accept an infinite number of function signature - for example, it's easy to write a GenericUDF that accepts array, array> and so on (arbitrary levels of nesting). 4. It can do short-circuit evaluations using DeferedObject." -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY
[ https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5221: Summary: Issue in column type with data type as BINARY (was: Issue in colun type with data type as BINARY) > Issue in column type with data type as BINARY > - > > Key: HIVE-5221 > URL: https://issues.apache.org/jira/browse/HIVE-5221 > Project: Hive > Issue Type: Bug >Reporter: Arun Vasu >Assignee: Mohammad Kamrul Islam >Priority: Critical > Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch > > > Hi, > I am using Hive 10. When I create an external table with column type as > Binary, the query result on the table is showing some junk values for the > column with binary datatype. > Please find below the query I have used to create the table: > CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) > ROW FORMAT DELIMITED >FIELDS TERMINATED BY '^' >LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/hivetables/testbinary'; > The query I have used is : select * from bool1 > The sample data in the hdfs file is: > 0^a...@abc.com^001 > 1^a...@abc.com^010 > ^a...@abc.com^011 > ^a...@abc.com^100 > t^a...@abc.com^101 > f^a...@abc.com^110 > true^a...@abc.com^111 > false^a...@abc.com^001 > 123^^01100010 > 12344^^0111 > Please share your inputs if it is possible. > Thanks, > Arun -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5221) Issue in column type with data type as BINARY
[ https://issues.apache.org/jira/browse/HIVE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5221: Attachment: HIVE-5221.2.patch Updated with Ashutosh's comment. > Issue in column type with data type as BINARY > - > > Key: HIVE-5221 > URL: https://issues.apache.org/jira/browse/HIVE-5221 > Project: Hive > Issue Type: Bug >Reporter: Arun Vasu >Assignee: Mohammad Kamrul Islam >Priority: Critical > Attachments: HIVE-5221.1.patch, HIVE-5221.2.patch > > > Hi, > I am using Hive 10. When I create an external table with column type as > Binary, the query result on the table is showing some junk values for the > column with binary datatype. > Please find below the query I have used to create the table: > CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY) > ROW FORMAT DELIMITED >FIELDS TERMINATED BY '^' >LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/hivetables/testbinary'; > The query I have used is : select * from bool1 > The sample data in the hdfs file is: > 0^a...@abc.com^001 > 1^a...@abc.com^010 > ^a...@abc.com^011 > ^a...@abc.com^100 > t^a...@abc.com^101 > f^a...@abc.com^110 > true^a...@abc.com^111 > false^a...@abc.com^001 > 123^^01100010 > 12344^^0111 > Please share your inputs if it is possible. > Thanks, > Arun -- This message was sent by Atlassian JIRA (v6.1#6144)