date:20100621

[jira] Updated: (HIVE-1423) Remove Thrift/FB303 headers/src from Hive source tree

2010-06-21 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1423:
-

Component/s: Server Infrastructure

> Remove Thrift/FB303 headers/src from Hive source tree
> -
>
> Key: HIVE-1423
> URL: https://issues.apache.org/jira/browse/HIVE-1423
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients, Server Infrastructure
>Reporter: Carl Steinbach
>
> There is a fair amount of code from the Thrift and fb303 libraries that was
> checked into the Hive source tree as part of HIVE-73. This code should be
> removed and the odbc driver Makefile should be reworked to depend on the
> contents of THRIFT_HOME and FB303_HOME as defined by the user.
> {code}
> ./service/include/thrift/concurrency/Exception.h
> ./service/include/thrift/concurrency/FunctionRunner.h
> ./service/include/thrift/concurrency/Monitor.h
> ./service/include/thrift/concurrency/Mutex.h
> ./service/include/thrift/concurrency/PosixThreadFactory.h
> ./service/include/thrift/concurrency/Thread.h
> ./service/include/thrift/concurrency/ThreadManager.h
> ./service/include/thrift/concurrency/TimerManager.h
> ./service/include/thrift/concurrency/Util.h
> ./service/include/thrift/config.h
> ./service/include/thrift/fb303/FacebookBase.h
> ./service/include/thrift/fb303/FacebookService.cpp
> ./service/include/thrift/fb303/FacebookService.h
> ./service/include/thrift/fb303/fb303_constants.cpp
> ./service/include/thrift/fb303/fb303_constants.h
> ./service/include/thrift/fb303/fb303_types.cpp
> ./service/include/thrift/fb303/fb303_types.h
> ./service/include/thrift/fb303/if/fb303.thrift
> ./service/include/thrift/fb303/out
> ./service/include/thrift/fb303/ServiceTracker.h
> ./service/include/thrift/if/reflection_limited.thrift
> ./service/include/thrift/processor/PeekProcessor.h
> ./service/include/thrift/processor/StatsProcessor.h
> ./service/include/thrift/protocol/TBase64Utils.h
> ./service/include/thrift/protocol/TBinaryProtocol.h
> ./service/include/thrift/protocol/TCompactProtocol.h
> ./service/include/thrift/protocol/TDebugProtocol.h
> ./service/include/thrift/protocol/TDenseProtocol.h
> ./service/include/thrift/protocol/TJSONProtocol.h
> ./service/include/thrift/protocol/TOneWayProtocol.h
> ./service/include/thrift/protocol/TProtocol.h
> ./service/include/thrift/protocol/TProtocolException.h
> ./service/include/thrift/protocol/TProtocolTap.h
> ./service/include/thrift/reflection_limited_types.h
> ./service/include/thrift/server/TNonblockingServer.h
> ./service/include/thrift/server/TServer.h
> ./service/include/thrift/server/TSimpleServer.h
> ./service/include/thrift/server/TThreadedServer.h
> ./service/include/thrift/server/TThreadPoolServer.h
> ./service/include/thrift/Thrift.h
> ./service/include/thrift/TLogging.h
> ./service/include/thrift/TProcessor.h
> ./service/include/thrift/transport/TBufferTransports.h
> ./service/include/thrift/transport/TFDTransport.h
> ./service/include/thrift/transport/TFileTransport.h
> ./service/include/thrift/transport/THttpClient.h
> ./service/include/thrift/transport/TServerSocket.h
> ./service/include/thrift/transport/TServerTransport.h
> ./service/include/thrift/transport/TShortReadTransport.h
> ./service/include/thrift/transport/TSimpleFileTransport.h
> ./service/include/thrift/transport/TSocket.h
> ./service/include/thrift/transport/TSocketPool.h
> ./service/include/thrift/transport/TTransport.h
> ./service/include/thrift/transport/TTransportException.h
> ./service/include/thrift/transport/TTransportUtils.h
> ./service/include/thrift/transport/TZlibTransport.h
> ./service/include/thrift/TReflectionLocal.h
> ./service/lib/php/autoload.php
> ./service/lib/php/ext/thrift_protocol
> ./service/lib/php/ext/thrift_protocol/config.m4
> ./service/lib/php/ext/thrift_protocol/php_thrift_protocol.cpp
> ./service/lib/php/ext/thrift_protocol/php_thrift_protocol.h
> ./service/lib/php/ext/thrift_protocol/tags/1.0.0/config.m4
> ./service/lib/php/ext/thrift_protocol/tags/1.0.0/php_thrift_protocol.cpp
> ./service/lib/php/ext/thrift_protocol/tags/1.0.0/php_thrift_protocol.h
> ./service/lib/php/packages/fb303/FacebookService.php
> ./service/lib/php/packages/fb303/fb303_types.php
> ./service/lib/php/protocol/TBinaryProtocol.php
> ./service/lib/php/protocol/TProtocol.php
> ./service/lib/php/Thrift.php
> ./service/lib/php/transport/TBufferedTransport.php
> ./service/lib/php/transport/TFramedTransport.php
> ./service/lib/php/transport/THttpClient.php
> ./service/lib/php/transport/TMemoryBuffer.php
> ./service/lib/php/transport/TNullTransport.php
> ./service/lib/php/transport/TPhpStream.php
> ./service/lib/php/transport/TSocket.php
> ./service/lib/php/transport/TSocketPool.php
> ./service/lib/php/transport/TTransport.php
> ./service/lib/py/fb303/__init__.py
> ./service/lib/py/fb303/cons

[jira] Updated: (HIVE-73) Thrift Server and Client for Hive

2010-06-21 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-73:
---

Fix Version/s: 0.3.0
   (was: 0.6.0)

> Thrift Server and Client for Hive
> -
>
> Key: HIVE-73
> URL: https://issues.apache.org/jira/browse/HIVE-73
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients, Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
> Fix For: 0.3.0
>
> Attachments: hive-73.1.patch, hive-73.10.patch, hive-73.11.patch, 
> hive-73.12.patch, hive-73.2.patch, hive-73.3.txt, hive-73.4.txt, 
> hive-73.5.txt, hive-73.6.patch, hive-73.7.patch, hive-73.8.patch, 
> hive-73.9.patch
>
>
> Currently the hive cli directly calls the driver code. We need to be able to 
> run a stand alone hive server that multiple clients can connect to. The hive 
> server will allow clients to run queries as well as make meta data calls (by 
> inheriting from the thrift metastore server)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-73) Thrift Server and Client for Hive

2010-06-21 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-73:
---

Component/s: Clients

> Thrift Server and Client for Hive
> -
>
> Key: HIVE-73
> URL: https://issues.apache.org/jira/browse/HIVE-73
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients, Server Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
> Fix For: 0.3.0
>
> Attachments: hive-73.1.patch, hive-73.10.patch, hive-73.11.patch, 
> hive-73.12.patch, hive-73.2.patch, hive-73.3.txt, hive-73.4.txt, 
> hive-73.5.txt, hive-73.6.patch, hive-73.7.patch, hive-73.8.patch, 
> hive-73.9.patch
>
>
> Currently the hive cli directly calls the driver code. We need to be able to 
> run a stand alone hive server that multiple clients can connect to. The hive 
> server will allow clients to run queries as well as make meta data calls (by 
> inheriting from the thrift metastore server)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1423) Remove Thrift/FB303 headers/src from Hive source tree

2010-06-21 Thread Carl Steinbach (JIRA)

Remove Thrift/FB303 headers/src from Hive source tree
-

 Key: HIVE-1423
 URL: https://issues.apache.org/jira/browse/HIVE-1423
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Reporter: Carl Steinbach


There is a fair amount of code from the Thrift and fb303 libraries that was
checked into the Hive source tree as part of HIVE-73. This code should be
removed and the odbc driver Makefile should be reworked to depend on the
contents of THRIFT_HOME and FB303_HOME as defined by the user.

{code}
./service/include/thrift/concurrency/Exception.h
./service/include/thrift/concurrency/FunctionRunner.h
./service/include/thrift/concurrency/Monitor.h
./service/include/thrift/concurrency/Mutex.h
./service/include/thrift/concurrency/PosixThreadFactory.h
./service/include/thrift/concurrency/Thread.h
./service/include/thrift/concurrency/ThreadManager.h
./service/include/thrift/concurrency/TimerManager.h
./service/include/thrift/concurrency/Util.h
./service/include/thrift/config.h
./service/include/thrift/fb303/FacebookBase.h
./service/include/thrift/fb303/FacebookService.cpp
./service/include/thrift/fb303/FacebookService.h
./service/include/thrift/fb303/fb303_constants.cpp
./service/include/thrift/fb303/fb303_constants.h
./service/include/thrift/fb303/fb303_types.cpp
./service/include/thrift/fb303/fb303_types.h
./service/include/thrift/fb303/if/fb303.thrift
./service/include/thrift/fb303/out
./service/include/thrift/fb303/ServiceTracker.h
./service/include/thrift/if/reflection_limited.thrift
./service/include/thrift/processor/PeekProcessor.h
./service/include/thrift/processor/StatsProcessor.h
./service/include/thrift/protocol/TBase64Utils.h
./service/include/thrift/protocol/TBinaryProtocol.h
./service/include/thrift/protocol/TCompactProtocol.h
./service/include/thrift/protocol/TDebugProtocol.h
./service/include/thrift/protocol/TDenseProtocol.h
./service/include/thrift/protocol/TJSONProtocol.h
./service/include/thrift/protocol/TOneWayProtocol.h
./service/include/thrift/protocol/TProtocol.h
./service/include/thrift/protocol/TProtocolException.h
./service/include/thrift/protocol/TProtocolTap.h
./service/include/thrift/reflection_limited_types.h
./service/include/thrift/server/TNonblockingServer.h
./service/include/thrift/server/TServer.h
./service/include/thrift/server/TSimpleServer.h
./service/include/thrift/server/TThreadedServer.h
./service/include/thrift/server/TThreadPoolServer.h
./service/include/thrift/Thrift.h
./service/include/thrift/TLogging.h
./service/include/thrift/TProcessor.h
./service/include/thrift/transport/TBufferTransports.h
./service/include/thrift/transport/TFDTransport.h
./service/include/thrift/transport/TFileTransport.h
./service/include/thrift/transport/THttpClient.h
./service/include/thrift/transport/TServerSocket.h
./service/include/thrift/transport/TServerTransport.h
./service/include/thrift/transport/TShortReadTransport.h
./service/include/thrift/transport/TSimpleFileTransport.h
./service/include/thrift/transport/TSocket.h
./service/include/thrift/transport/TSocketPool.h
./service/include/thrift/transport/TTransport.h
./service/include/thrift/transport/TTransportException.h
./service/include/thrift/transport/TTransportUtils.h
./service/include/thrift/transport/TZlibTransport.h
./service/include/thrift/TReflectionLocal.h
./service/lib/php/autoload.php
./service/lib/php/ext/thrift_protocol
./service/lib/php/ext/thrift_protocol/config.m4
./service/lib/php/ext/thrift_protocol/php_thrift_protocol.cpp
./service/lib/php/ext/thrift_protocol/php_thrift_protocol.h
./service/lib/php/ext/thrift_protocol/tags/1.0.0/config.m4
./service/lib/php/ext/thrift_protocol/tags/1.0.0/php_thrift_protocol.cpp
./service/lib/php/ext/thrift_protocol/tags/1.0.0/php_thrift_protocol.h
./service/lib/php/packages/fb303/FacebookService.php
./service/lib/php/packages/fb303/fb303_types.php
./service/lib/php/protocol/TBinaryProtocol.php
./service/lib/php/protocol/TProtocol.php
./service/lib/php/Thrift.php
./service/lib/php/transport/TBufferedTransport.php
./service/lib/php/transport/TFramedTransport.php
./service/lib/php/transport/THttpClient.php
./service/lib/php/transport/TMemoryBuffer.php
./service/lib/php/transport/TNullTransport.php
./service/lib/php/transport/TPhpStream.php
./service/lib/php/transport/TSocket.php
./service/lib/php/transport/TSocketPool.php
./service/lib/php/transport/TTransport.php
./service/lib/py/fb303/__init__.py
./service/lib/py/fb303/constants.py
./service/lib/py/fb303/FacebookBase.py
./service/lib/py/fb303/FacebookService-remote
./service/lib/py/fb303/FacebookService.py
./service/lib/py/fb303/ttypes.py
./service/lib/py/fb303_scripts/__init__.py
./service/lib/py/fb303_scripts/fb303_simple_mgmt.py
./service/lib/py/thrift/__init__.py
./service/lib/py/thrift/protocol
./service/lib/py/thrift/protocol/__init__.py
./service/lib/py/thrift/protocol/fastbinary.c
./service/lib/py/

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-21 Thread Paul Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881026#action_12881026
 ] 

Paul Yang commented on HIVE-1176:
-

I was going to look at it again today, but looks like I'll get to it around 
mid-day tomorrow? Will keep this posted.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881024#action_12881024
 ] 

Arvind Prabhakar commented on HIVE-1271:


Is anyone reviewing this change? Thanks.

> Case sensitiveness of type information specified when using custom reducer 
> causes type mismatch
> ---
>
> Key: HIVE-1271
> URL: https://issues.apache.org/jira/browse/HIVE-1271
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1271-1.patch, HIVE-1271.patch
>
>
> Type information specified  while using a custom reduce script is converted 
> to lower case, and causes type mismatch during query semantic analysis .  The 
> following REDUCE query where field name =  "userId" failed.
> hive> CREATE TABLE SS (
>> a INT,
>> b INT,
>> vals ARRAY>
>> );
> OK
> hive> FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
>> INSERT OVERWRITE TABLE SS
>> REDUCE *
>> USING 'myreduce.py'
>> AS
>> (a INT,
>> b INT,
>> vals ARRAY>
>> )
>> ;
> FAILED: Error in semantic analysis: line 2:27 Cannot insert into
> target table because column number/types are different SS: Cannot
> convert column 2 from array> to
> array>.
> The same query worked fine after changing "userId" to "userid".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881023#action_12881023
 ] 

Arvind Prabhakar commented on HIVE-1176:


@Paul: Any updates on this from your end? Thanks.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-06-21 Thread Arvind Prabhakar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881020#action_12881020
 ] 

Arvind Prabhakar commented on HIVE-287:
---

@John: Can you please take a look at the updated patch? Let me know if you have 
any feedback for further tweaking this change as necessary.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-21 Thread Arvind Prabhakar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-4.patch

applies cleanly on trunk and branch-0.6

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
> HIVE-287-4.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-06-21 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1422:
-

Attachment: HIVE-1422.1.patch

> skip counter update when RunningJob.getCounters() returns null
> --
>
> Key: HIVE-1422
> URL: https://issues.apache.org/jira/browse/HIVE-1422
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1422.1.patch
>
>
> Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
> trying to dereference a null Counters object.  I don't have a unit test which 
> can reproduce it, but here's an example stack from a production cluster we 
> saw today:
> 10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
> with exception 'java.lang.NullPointerException(null)'
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1422) skip counter update when RunningJob.getCounters() returns null

2010-06-21 Thread John Sichi (JIRA)

skip counter update when RunningJob.getCounters() returns null
--

 Key: HIVE-1422
 URL: https://issues.apache.org/jira/browse/HIVE-1422
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0


Under heavy load circumstances on some Hadoop versions, we may get a NPE from 
trying to dereference a null Counters object.  I don't have a unit test which 
can reproduce it, but here's an example stack from a production cluster we saw 
today:

10/06/21 13:01:10 ERROR exec.ExecDriver: Ended Job = job_201005200457_701060 
with exception 'java.lang.NullPointerException(null)'
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.Operator.updateCounters(Operator.java:999)
at org.apache.hadoop.hive.ql.exec.ExecDriver.updateCounters(ExecDriver.java:503)
at org.apache.hadoop.hive.ql.exec.ExecDriver.progress(ExecDriver.java:390)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:697)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1417) Archived partitions throw error with queries calling getContentSummary

2010-06-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881004#action_12881004
 ] 

Namit Jain commented on HIVE-1417:
--

will review

> Archived partitions throw error with queries calling getContentSummary
> --
>
> Key: HIVE-1417
> URL: https://issues.apache.org/jira/browse/HIVE-1417
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch
>
>
> Assuming you have a src table with a ds='1' partition that is archived in 
> HDFS, the following query will throw an exception
> {code}
> select count(1) from src where ds='1' group by key;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1421) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1421.


Hadoop Flags: [Reviewed]
  Resolution: Fixed

namit sent me the patch, and we tested and committed during the jira downtime. 
Will close this.

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1421
> URL: https://issues.apache.org/jira/browse/HIVE-1421
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: Namit Jain
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive.1421.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1421) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881001#action_12881001
 ] 

Namit Jain commented on HIVE-1421:
--

patch for 0.6 and trunk

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1421
> URL: https://issues.apache.org/jira/browse/HIVE-1421
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: Namit Jain
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive.1421.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1421) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1421:
-

Attachment: hive.1421.2.patch

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1421
> URL: https://issues.apache.org/jira/browse/HIVE-1421
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: He Yongqiang
>Assignee: Namit Jain
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive.1421.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-21 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1412:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Ning

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1412.2.patch, HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1420) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-1420.


Resolution: Duplicate

duplicate of 1421

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1420
> URL: https://issues.apache.org/jira/browse/HIVE-1420
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1420.1.patch
>
>
> drop table foo;
> create table foo (src int, value string) partitioned by (ds string);
> alter table foo   set fileformat Sequencefile;
> insert overwrite table foo partition (ds='1')
> select key, value from src;
> alter table foo   add partition (ds='2');
> alter table foo set fileformat rcfile;
> select count(1) from foo;
> The above testcase fails

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1420) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread He Yongqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1420:
---

Status: Open  (was: Patch Available)

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1420
> URL: https://issues.apache.org/jira/browse/HIVE-1420
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1420.1.patch
>
>
> drop table foo;
> create table foo (src int, value string) partitioned by (ds string);
> alter table foo   set fileformat Sequencefile;
> insert overwrite table foo partition (ds='1')
> select key, value from src;
> alter table foo   add partition (ds='2');
> alter table foo set fileformat rcfile;
> select count(1) from foo;
> The above testcase fails

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1421) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread He Yongqiang (JIRA)

problem with sequence and rcfiles are mixed for null partitions
---

 Key: HIVE-1421
 URL: https://issues.apache.org/jira/browse/HIVE-1421
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: He Yongqiang
Assignee: Namit Jain
 Fix For: 0.6.0, 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-06-21 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1414:
-

Fix Version/s: 0.7.0

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1414) automatically invoke .hiverc init script

2010-06-21 Thread John Sichi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880992#action_12880992
 ] 

John Sichi commented on HIVE-1414:
--

OK, those choices make sense.

Some review comments:

1) In HIVE-1405, I added a processFile method which takes care of closing the 
reader to avoid resource leak.  Could you review and commit that patch, and 
then update your patch here to call processFile?

2) If either getenv or getProperty returns null, we should skip the 
corresponding exists check completely to avoid looking for a filename like 
("null/bin/.hiverc")

3) I think your code needs to move up into my processInitFiles location, 
otherwise it won't get run for the -f and -e cases.  Also, let's say that if -i 
is specified, then we skip the .hiverc execution (to match bash -init-file 
behavior).  Note that .hiverc execution should happen inside of my silent-mode 
block so that it does not show up in console output.
 

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Attachments: hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1417) Archived partitions throw error with queries calling getContentSummary

2010-06-21 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1417:


Attachment: HIVE-1417.1.patch
HIVE-1417.branch-0.6.1.patch

Expanded test coverage to include join and group by, but this bug doesn't show 
up during unit tests because the underlying filesystem is not HDFS.

> Archived partitions throw error with queries calling getContentSummary
> --
>
> Key: HIVE-1417
> URL: https://issues.apache.org/jira/browse/HIVE-1417
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch
>
>
> Assuming you have a src table with a ds='1' partition that is archived in 
> HDFS, the following query will throw an exception
> {code}
> select count(1) from src where ds='1' group by key;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1405) hive command line option -i to run an init file before other SQL commands

2010-06-21 Thread John Sichi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1405:
-

Fix Version/s: 0.7.0
   (was: 0.6.0)

> hive command line option -i to run an init file before other SQL commands
> -
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: Jonathan Chang
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1405.1.patch
>
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1417) Archived partitions throw error with queries calling getContentSummary

2010-06-21 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1417:


   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

> Archived partitions throw error with queries calling getContentSummary
> --
>
> Key: HIVE-1417
> URL: https://issues.apache.org/jira/browse/HIVE-1417
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch
>
>
> Assuming you have a src table with a ds='1' partition that is archived in 
> HDFS, the following query will throw an exception
> {code}
> select count(1) from src where ds='1' group by key;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1417) Archived partitions throw error with queries calling getContentSummary

2010-06-21 Thread Paul Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1417:


Description: 
Assuming you have a src table with a ds='1' partition that is archived in HDFS, 
the following query will throw an exception

{code}
select count(1) from src where ds='1' group by key;
{code}


  was:
Assuming you have a src table with a ds='1' partition that is archived, the 
following table will throw an exception

{code}
select count(1) from src where ds='1' group by key;
{code}


> Archived partitions throw error with queries calling getContentSummary
> --
>
> Key: HIVE-1417
> URL: https://issues.apache.org/jira/browse/HIVE-1417
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Attachments: HIVE-1417.1.patch, HIVE-1417.branch-0.6.1.patch
>
>
> Assuming you have a src table with a ds='1' partition that is archived in 
> HDFS, the following query will throw an exception
> {code}
> select count(1) from src where ds='1' group by key;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: about to cut the 0.6 release branch...

2010-06-21 Thread John Sichi

On Jun 20, 2010, at 5:37 PM,  wrote:
> * for cases where a single patch is being applied to both trunk and branch, 
> commit to trunk first, then merge that to branch (rather than reapplying the 
> patch on branch independently); someone please correct me if I have this  
> wrong

Correction from conversation with Namit:  apparently we never use merge, and 
always apply the patch on the branch directly.

JVS

[jira] Updated: (HIVE-1359) Unit test should be shim-aware

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1359:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   0.7.0
Fix Version/s: 0.7.0

> Unit test should be shim-aware
> --
>
> Key: HIVE-1359
> URL: https://issues.apache.org/jira/browse/HIVE-1359
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1359.patch, unit_tests.txt
>
>
> Some features in Hive only works for certain Hadoop versions through shim. 
> However the unit test structure is not shim-aware in that there is only one 
> set of queries and expected outputs for all Hadoop versions. This may not be 
> sufficient when we will have different output for different Hadoop versions. 
> One example is CombineHiveInputFormat wich is only available from Hadoop 
> 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
> different. Another example is archival partitions (HAR) which is also only 
> available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1416:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   0.7.0
Fix Version/s: 0.6.0
   0.7.0

> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1416.patch
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1359) Unit test should be shim-aware

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1359:
-

Attachment: HIVE-1359.patch

> Unit test should be shim-aware
> --
>
> Key: HIVE-1359
> URL: https://issues.apache.org/jira/browse/HIVE-1359
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1359.patch, unit_tests.txt
>
>
> Some features in Hive only works for certain Hadoop versions through shim. 
> However the unit test structure is not shim-aware in that there is only one 
> set of queries and expected outputs for all Hadoop versions. This may not be 
> sufficient when we will have different output for different Hadoop versions. 
> One example is CombineHiveInputFormat wich is only available from Hadoop 
> 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be 
> different. Another example is archival partitions (HAR) which is also only 
> available from 0.20. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1412:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   0.7.0
Fix Version/s: 0.7.0

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1412.2.patch, HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1416:
-

Attachment: HIVE-1416.patch

> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> --
>
> Key: HIVE-1416
> URL: https://issues.apache.org/jira/browse/HIVE-1416
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1416.patch
>
>
> Hive parses the file name generated by tasks to figure out the task ID in 
> order to generate files for empty buckets. Different hadoop versions and 
> execution mode have different ways of naming  output files by 
> mappers/reducers. We need to move the parsing code to shims. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1412) CombineHiveInputFormat bug on tablesample

2010-06-21 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1412:
-

Attachment: HIVE-1412.2.patch

Added a unit test

> CombineHiveInputFormat bug on tablesample
> -
>
> Key: HIVE-1412
> URL: https://issues.apache.org/jira/browse/HIVE-1412
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1412.2.patch, HIVE-1412.patch
>
>
> CombineHiveInputFormat should combine all files inside one partition to form 
> a split but should not takes files cross partition boundary. This works for 
> regular table and partitions since all input paths are directory. However 
> this breaks when the input is files (in which case tablesample could be the 
> use case). CombineHiveInputFormat should adjust to the case when input could 
> also be non-directories. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Latest trunk breaks logging?

2010-06-21 Thread Joydeep Sen Sarma

543 changes logging for local mode. By default, the logs from local mode 
map-reduce tasks now go to a per-query log file that's written to 
/tmp//.log

This was very much intended to enable local mode to be a bit friendlier to use.

-Original Message-
From: Mayank Lahiri [mailto:mayank.lah...@facebook.com] 
Sent: Monday, June 21, 2010 2:54 PM
To: John Sichi
Cc: hive-dev@hadoop.apache.org
Subject: Re: Latest trunk breaks logging?

Hi Joydeep,

I've confirmed that logging is restored after reverting HIVE-543. The problem 
is that LOG.warn() calls from inside UDAFs do not generate any output after 
applying HIVE-543. For example, passing an invalid argument to histogram() 
causes a general one-line exception instead of the diagnostic HiveException 
that is supposed to be thrown. This is what HEAD currently returns:

- snip --
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
 end snip 

Was this intended, or are the logs just being written to some other location 
post-HIVE-543?

Thanks,
-Mayank

On 6/21/10 1:00 PM, "John Sichi"  wrote:

+hive-dev

Looks like Joydeep made some changes to the logging in recently committed 
HIVE-543; maybe related?

Apache JIRA is down at the moment but I found the patch here:

http://mail-archives.apache.org/mod_mbox/hadoop-hive-commits/201006.mbox/%3c20100616225037.c67df2388...@eris.apache.org%3e

Mayank, if you can confirm that this is the cause, we can check with Joydeep on 
whether or not this was intentional.

The DataNucleus noise I've been seeing for a while now; I think it's harmless, 
but go ahead and create a JIRA issue to get it cleaned up.

JVS

On Jun 21, 2010, at 12:34 PM, Mayank Lahiri wrote:

Hi John,

I just updated trunk and logging seems to be slightly different, and possibly 
broken. For one, LOG.warn() messages from inside UDAFs don't show up anywhere, 
and are not printed to console. /tmp/mlahiri/hive.log contains a lot of lines 
like this:

2010-06-21 12:02:01,486 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:02:01,487 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:23:23,576 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:23:23,578 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:23:23,579 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:27:45,003 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:30:13,786 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:30:13,788 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:30:13,789 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
~

Any ideas what I could be doing differently/wrong?

- Mayank

Re: Latest trunk breaks logging?

2010-06-21 Thread Mayank Lahiri

Hi Joydeep,

I’ve confirmed that logging is restored after reverting HIVE-543. The problem 
is that LOG.warn() calls from inside UDAFs do not generate any output after 
applying HIVE-543. For example, passing an invalid argument to histogram() 
causes a general one-line exception instead of the diagnostic HiveException 
that is supposed to be thrown. This is what HEAD currently returns:

- snip --
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
 end snip 

Was this intended, or are the logs just being written to some other location 
post-HIVE-543?

Thanks,
-Mayank


On 6/21/10 1:00 PM, "John Sichi"  wrote:

+hive-dev

Looks like Joydeep made some changes to the logging in recently committed 
HIVE-543; maybe related?

Apache JIRA is down at the moment but I found the patch here:

http://mail-archives.apache.org/mod_mbox/hadoop-hive-commits/201006.mbox/%3c20100616225037.c67df2388...@eris.apache.org%3e

Mayank, if you can confirm that this is the cause, we can check with Joydeep on 
whether or not this was intentional.

The DataNucleus noise I've been seeing for a while now; I think it's harmless, 
but go ahead and create a JIRA issue to get it cleaned up.

JVS

On Jun 21, 2010, at 12:34 PM, Mayank Lahiri wrote:

Hi John,

I just updated trunk and logging seems to be slightly different, and possibly 
broken. For one, LOG.warn() messages from inside UDAFs don’t show up anywhere, 
and are not printed to console. /tmp/mlahiri/hive.log contains a lot of lines 
like this:

2010-06-21 12:02:01,486 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:02:01,487 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:23:23,576 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:23:23,578 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:23:23,579 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:27:45,003 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:30:13,786 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:30:13,788 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:30:13,789 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
~

Any ideas what I could be doing differently/wrong?

- Mayank

Re: Latest trunk breaks logging?

2010-06-21 Thread John Sichi

+hive-dev

Looks like Joydeep made some changes to the logging in recently committed 
HIVE-543; maybe related?

Apache JIRA is down at the moment but I found the patch here:

http://mail-archives.apache.org/mod_mbox/hadoop-hive-commits/201006.mbox/%3c20100616225037.c67df2388...@eris.apache.org%3e

Mayank, if you can confirm that this is the cause, we can check with Joydeep on 
whether or not this was intentional.

The DataNucleus noise I've been seeing for a while now; I think it's harmless, 
but go ahead and create a JIRA issue to get it cleaned up.

JVS

On Jun 21, 2010, at 12:34 PM, Mayank Lahiri wrote:

Hi John,

I just updated trunk and logging seems to be slightly different, and possibly 
broken. For one, LOG.warn() messages from inside UDAFs don’t show up anywhere, 
and are not printed to console. /tmp/mlahiri/hive.log contains a lot of lines 
like this:

2010-06-21 12:02:01,486 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:02:01,487 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:23:23,576 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:23:23,578 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:23:23,579 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:27:45,003 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:27:45,005 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
2010-06-21 12:30:13,786 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.resources" but it cannot be resolved.
2010-06-21 12:30:13,788 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.core.runtime" but it cannot be resolved.
2010-06-21 12:30:13,789 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) 
- Bundle "org.eclipse.jdt.c
ore" requires "org.eclipse.text" but it cannot be resolved.
~

Any ideas what I could be doing differently/wrong?

- Mayank

[jira] Commented: (HIVE-1419) Policy on deserialization errors

2010-06-21 Thread Vladimir Klimontovich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880909#action_12880909
 ] 

Vladimir Klimontovich commented on HIVE-1419:
-

If it works fine for you now, it won't be broken by this patch.

> Policy on deserialization errors
> 
>
> Key: HIVE-1419
> URL: https://issues.apache.org/jira/browse/HIVE-1419
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
> Attachments: corrupted_records_0.5.patch, 
> corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
> corrupted_records_trunk_ver2.patch
>
>
> When deserializer throws an exception the whole map tasks fails (see 
> MapOperator.java file). It's not always an convenient behavior especially on 
> huge datasets where several corrupted lines could be a normal practice. 
> Proposed solution:
> 1) Have a counter of corrupted records
> 2) When a counter exceeds a limit (configurable via 
> hive.max.deserializer.errors property, 0 by default) throw an exception. 
> Otherwise just log and exception with WARN level.
> Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1419) Policy on deserialization errors

2010-06-21 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880908#action_12880908
 ] 

Edward Capriolo commented on HIVE-1419:
---

I am looking through this and trying to wrap my head around it. Off hand do you 
know what happens in this situation. 

We have a table that we have added columns to over time

create table tab (a int, b int);

Over time we have added more columns

alter table tab (a int, b int, c int)

This works fine for us as selecting column c on older data returns null for 
that column. Will this behaviour be preserved ?

> Policy on deserialization errors
> 
>
> Key: HIVE-1419
> URL: https://issues.apache.org/jira/browse/HIVE-1419
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
> Attachments: corrupted_records_0.5.patch, 
> corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
> corrupted_records_trunk_ver2.patch
>
>
> When deserializer throws an exception the whole map tasks fails (see 
> MapOperator.java file). It's not always an convenient behavior especially on 
> huge datasets where several corrupted lines could be a normal practice. 
> Proposed solution:
> 1) Have a counter of corrupted records
> 2) When a counter exceeds a limit (configurable via 
> hive.max.deserializer.errors property, 0 by default) throw an exception. 
> Otherwise just log and exception with WARN level.
> Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1420) problem with sequence and rcfiles are mixed for null partitions

2010-06-21 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880904#action_12880904
 ] 

Namit Jain commented on HIVE-1420:
--

+1


will commit if the tests pass

> problem with sequence and rcfiles are mixed for null partitions
> ---
>
> Key: HIVE-1420
> URL: https://issues.apache.org/jira/browse/HIVE-1420
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: Namit Jain
>Assignee: He Yongqiang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1420.1.patch
>
>
> drop table foo;
> create table foo (src int, value string) partitioned by (ds string);
> alter table foo   set fileformat Sequencefile;
> insert overwrite table foo partition (ds='1')
> select key, value from src;
> alter table foo   add partition (ds='2');
> alter table foo set fileformat rcfile;
> select count(1) from foo;
> The above testcase fails

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-801) row-wise IN would be useful

2010-06-21 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-801:


Affects Version/s: 0.5.0
   0.4.1
   0.4.0
   0.3.0
  Component/s: Query Processor

> row-wise IN would be useful
> ---
>
> Key: HIVE-801
> URL: https://issues.apache.org/jira/browse/HIVE-801
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0
>Reporter: Adam Kramer
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-801.1.patch, HIVE-801.2.patch, HIVE-801.3.patch
>
>
> SELECT * FROM tablename t
> WHERE IN(12345,key1,key2,key3);
> ...IN would operate on a given row, and return True when the first argument 
> equaled at least one of the other arguments. So here IN would return true if 
> 12345=key1 OR 12345=key2 OR 12345=key3 (but wouldn't test the latter two if 
> the first matched).
> This would also help with https://issues.apache.org/jira/browse/HIVE-783, if 
> IN were implemented in a manner that allows it to be used in an ON clause.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

skew join in hive

2010-06-21 Thread Gang Luo

Hi,
I see the skew handling strategy as mentioned in hive-964. Here are some 
questions.
1. how to get the big keys for a table? Launch a mr job to build histogram on 
each table?
2. now that we get big/skewed keys, do we also have small/non-skewed keys? Do 
we process these non-skewed keys in the same way (replicate join), or in the 
traditional way (redistribution join)?

Thanks,
-Gang

[jira] Updated: (HIVE-1419) Policy on deserialization errors

2010-06-21 Thread Vladimir Klimontovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Klimontovich updated HIVE-1419:


Attachment: corrupted_records_0.5_ver2.patch
corrupted_records_trunk_ver2.patch

A bit improved version of patch attached. MapOperator now skips record if 
deserializer returned null.

It makes deserializer plugin arch more flexible. If deserializer considers 
record as non-sense (corrupted, empty, whatever) it could simply return null 
and signal Hive not to consider it.

> Policy on deserialization errors
> 
>
> Key: HIVE-1419
> URL: https://issues.apache.org/jira/browse/HIVE-1419
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
> Attachments: corrupted_records_0.5.patch, 
> corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
> corrupted_records_trunk_ver2.patch
>
>
> When deserializer throws an exception the whole map tasks fails (see 
> MapOperator.java file). It's not always an convenient behavior especially on 
> huge datasets where several corrupted lines could be a normal practice. 
> Proposed solution:
> 1) Have a counter of corrupted records
> 2) When a counter exceeds a limit (configurable via 
> hive.max.deserializer.errors property, 0 by default) throw an exception. 
> Otherwise just log and exception with WARN level.
> Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

41 matches

Mail list logo