[jira] Updated: (HIVE-1609) Support partition filtering in metastore

2010-08-31 Thread Ajay Kidave (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kidave updated HIVE-1609:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0

> Support partition filtering in metastore
> 
>
> Key: HIVE-1609
> URL: https://issues.apache.org/jira/browse/HIVE-1609
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Ajay Kidave
> Fix For: 0.6.0
>
> Attachments: hive_1609.patch
>
>
> The metastore needs to have support for returning a list of partitions based 
> on user specified filter conditions. This will be useful for tools which need 
> to do partition pruning. Howl is one such use case. The way partition pruning 
> is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1609) Support partition filtering in metastore

2010-08-31 Thread Ajay Kidave (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kidave updated HIVE-1609:
--

Attachment: hive_1609.patch

Attached patch with support for a new metastore API which returns list of 
partitions matching specified string filter. Thrift does not support recursive 
nested structures, so the filter is specified as a string instead of an 
expression object. The datanucleus jar version is upgraded to get support for 
the JDOQL operations needed (a clean build is required at the root level to 
remove the older version of the datanucleus jars from build/ivy/lib).

> Support partition filtering in metastore
> 
>
> Key: HIVE-1609
> URL: https://issues.apache.org/jira/browse/HIVE-1609
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Ajay Kidave
> Attachments: hive_1609.patch
>
>
> The metastore needs to have support for returning a list of partitions based 
> on user specified filter conditions. This will be useful for tools which need 
> to do partition pruning. Howl is one such use case. The way partition pruning 
> is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1609) Support partition filtering in metastore

2010-08-31 Thread Ajay Kidave (JIRA)
Support partition filtering in metastore


 Key: HIVE-1609
 URL: https://issues.apache.org/jira/browse/HIVE-1609
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.5.0
Reporter: Ajay Kidave


The metastore needs to have support for returning a list of partitions based on 
user specified filter conditions. This will be useful for tools which need to 
do partition pruning. Howl is one such use case. The way partition pruning is 
done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Deserializing map column via JDBC (HIVE-1378)

2010-08-31 Thread Steven Wong
Upon further inspection, LazySimpleSerDe has the ability to serialize 
non-primitives into JSON but doesn't have the reverse ability to deserialize 
JSON back.

Here's my proposal:

1. By default, hive.fetch.output.format = display.
2. When JDBC driver connects to Hive server, execute "set 
hive.fetch.output.format = ctrl".
3. In Hive server:
  (a) If hive.fetch.output.format == display, FetchTask initializes 
LazySimpleSerDe as it does today (field delimiter = tab, null sequence = 
"NULL", useJSONSerialize = true).
  (b) If hive.fetch.output.format == ctrl, FetchTask initializes 
LazySimpleSerDe to ctrl-delimit everything. This is LazySimpleSerDe's default 
behavior anyway if it's initialized with the schema (it isn't today).
4. JDBC driver deserializes with LazySimpleSerDe instead of DynamicSerDe.

Your feedback?

My only remaining concern is that, for "select * from partitioned_table", 3(b) 
might require fixing HIVE-1573 together, because I hit some partition-column 
problem when I tried 3(b) in the debugger. I hope HIVE-1573 can be fixed 
separately, but I don't know yet, I'll have to see.

Steven


-Original Message-
From: Steven Wong 
Sent: Friday, August 27, 2010 2:24 PM
To: hive-dev@hadoop.apache.org; 'John Sichi'
Cc: Zheng Shao; Jerome Boulon
Subject: RE: Deserializing map column via JDBC (HIVE-1378)

A related jira is HIVE-1606 (For a null value in a string column, JDBC driver 
returns the string "NULL"). What happens is the sever-side serde already turns 
the null into "NULL". Both null and "NULL" are serialized as "NULL"; the 
client-side serde has no hope. I bring this jira up to point out that JDBC's 
server side uses a serialization format that appears intended for display 
(human consumption) instead of deserialization. The mixing of non-JSON and JSON 
serializations is perhaps another manifestation.

Also, fixing HIVE-1606 will obviously require a server-side change. Both 
HIVE-1606 and HIVE-1378 (the jira at hand) can share some server-side change, 
if HIVE-1378 ends up changing the sever side too.

Steven


-Original Message-
From: John Sichi [mailto:jsi...@facebook.com] 
Sent: Friday, August 27, 2010 11:29 AM
To: Steven Wong
Cc: Zheng Shao; hive-dev@hadoop.apache.org; Jerome Boulon
Subject: Re: Deserializing map column via JDBC (HIVE-1378)

I don't know enough about the serdes to say whether that's a problem...maybe 
someone else does?  It seems like as long as the JSON form doesn't include the 
delimiter unescaped, it might work?

JVS

On Aug 26, 2010, at 6:29 PM, Steven Wong wrote:

That sounds like it'll work, at least conceptually. But if the row contains 
primitive and non-primitive columns, the row serialization will be a mix of 
non-JSON and JSON serializations, right? Is that a good thing?


From: John Sichi [mailto:jsi...@facebook.com]
Sent: Thursday, August 26, 2010 12:11 PM
To: Steven Wong
Cc: Zheng Shao; hive-dev@hadoop.apache.org; 
Jerome Boulon
Subject: Re: Deserializing map column via JDBC (HIVE-1378)

If you replace DynamicSerDe with LazySimpleSerDe on the JDBC client side, can't 
you then tell it to expect JSON serialization for the maps?  That way you can 
leave the FetchTask server side as is.

JVS

On Aug 24, 2010, at 2:50 PM, Steven Wong wrote:


I got sidetracked for awhile

Looking at client.fetchOne, it is a call to the Hive server, which shows the 
following call stack:

SerDeUtils.getJSONString(Object, ObjectInspector) line: 205
LazySimpleSerDe.serialize(Object, ObjectInspector) line: 420
FetchTask.fetch(ArrayList) line: 130
Driver.getResults(ArrayList) line: 660
HiveServer$HiveServerHandler.fetchOne() line: 238

In other words, FetchTask.mSerde (an instance of LazySimpleSerDe) serializes 
the map column into JSON strings. It's because FetchTask.mSerde has been 
initialized by FetchTask.initialize to do it that way.

It appears that the fix is to initialize FetchTask.mSerde differently to do 
ctrl-serialization instead - presumably for the JDBC use case only and not for 
other use cases of FetchTask. Further, it appears that FetchTask.mSerde will do 
ctrl-serialization if it is initialized (via the properties "columns" and 
"columns.types") with the proper schema.

Are these right? Pointers on how to get the proper schema? (From 
FetchTask.work?) And on how to restrict the change to JDBC only? (I have no 
idea.)

For symmetry, LazySimpleSerDe should be used to do ctrl-deserialization on the 
client side, per Zheng's suggestion.

Steven


From: Zheng Shao [mailto:zs...@facebook.com]
Sent: Monday, August 16, 2010 3:57 PM
To: Steven Wong; hive-dev@hadoop.apache.org
Cc: Jerome Boulon
Subject: RE: Deserializing map column via JDBC (HIVE-1378)

I think the call to client.fetchOne should use delimited format, so that 
DynamicSerDe can deserialize it.
This should be a good short-term fix.

Also on a higher level, DynamicSerDe is deprecated.  It will be great to use 
LazySimpleSer

[jira] Resolved: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1598.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

committed. Thanks Ning

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1016) Ability to access DistributedCache from UDFs

2010-08-31 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904860#action_12904860
 ] 

Carl Steinbach commented on HIVE-1016:
--

@Namit: GenericUDF.initialize() is called both at compile-time and run-time.

> Ability to access DistributedCache from UDFs
> 
>
> Key: HIVE-1016
> URL: https://issues.apache.org/jira/browse/HIVE-1016
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1016.1.patch.txt
>
>
> There have been several requests on the mailing list for
> information about how to access the DistributedCache from UDFs, e.g.:
> http://www.mail-archive.com/hive-u...@hadoop.apache.org/msg01650.html
> http://www.mail-archive.com/hive-u...@hadoop.apache.org/msg01926.html
> While responses to these emails suggested several workarounds, the only 
> correct
> way of accessing the distributed cache is via the static methods of Hadoop's
> DistributedCache class, and all of these methods require that the JobConf be 
> passed
> in as a parameter. Hence, giving UDFs access to the distributed cache
> reduces to giving UDFs access to the JobConf.
> I propose the following changes to GenericUDF/UDAF/UDTF:
> * Add an exec_init(Configuration conf) method that is called during Operator 
> initialization at runtime.
> * Change the name of the "initialize" method to "compile_init" to make it 
> clear that this method is called at compile-time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1264) Make Hive work with Hadoop security

2010-08-31 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-1264:
-

Assignee: Todd Lipcon  (was: Venkatesh S)

> Make Hive work with Hadoop security
> ---
>
> Key: HIVE-1264
> URL: https://issues.apache.org/jira/browse/HIVE-1264
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Jeff Hammerbacher
>Assignee: Todd Lipcon
> Attachments: HiveHadoop20S_patch.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-08-31 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904834#action_12904834
 ] 

John Sichi commented on HIVE-1546:
--

+1.  Will commit when tests pass.


> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546-3.patch, hive-1546.patch, hive-1546_2.patch
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1130) Create argmin and argmax

2010-08-31 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904762#action_12904762
 ] 

John Sichi commented on HIVE-1130:
--

Comparing with GenericUDAFMax, I noticed that function uses different inputOI 
and outputOI, whereas you use a single xInputOI for both.

  outputOI = ObjectInspectorUtils.getStandardObjectInspector(inputOI,
  ObjectInspectorCopyOption.JAVA);

Maybe you need to follow that pattern?


> Create argmin and argmax
> 
>
> Key: HIVE-1130
> URL: https://issues.apache.org/jira/browse/HIVE-1130
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Zheng Shao
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people 
> from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dropping support for pre-0.20 Hadoop versions

2010-08-31 Thread John Sichi
On Aug 31, 2010, at 3:13 AM, S. Venkatesh wrote:

> I think this is a huge benefit for all of us. Looking forward to it.
> Any time line you have in mind?
> 
> Thanks,
> Venkatesh


One issue which was raised at the last Hive contributor meeting was that Hive's 
new indexing support relies on getting the file offset while reading rows, but 
getPos has gone away.  So we're going to need to come up with a resolution for 
that.

JVS



[jira] Created: (HIVE-1608) use sequencefile as the default for storing intermediate results

2010-08-31 Thread Namit Jain (JIRA)
use sequencefile as the default for storing intermediate results


 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


The only argument for having a text file for storing intermediate results seems 
to be better debuggability.

But, tailing a sequence file is possible, and it should be more space efficient

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.2.patch

Attached the test case and also removed some debugging info. These are the only 
changes. 

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.2.patch, HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1598:
-

Status: Open  (was: Patch Available)

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904693#action_12904693
 ] 

Namit Jain commented on HIVE-1598:
--

Ning, can you add the test which was failing with TextFile ('\n' in the data)

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1545) Add a bunch of UDFs and UDAFs

2010-08-31 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1545:
-

Component/s: UDF

> Add a bunch of UDFs and UDAFs
> -
>
> Key: HIVE-1545
> URL: https://issues.apache.org/jira/browse/HIVE-1545
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Jonathan Chang
>Assignee: Jonathan Chang
>Priority: Minor
> Attachments: udfs.tar.gz
>
>
> Here some UD(A)Fs which can be incorporated into the Hive distribution:
> UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
> 5, 3) returns 1.
> UDFBucket - Find the bucket in which the first argument belongs. e.g., 
> BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} 
> but <= b_{i+1}. Returns 0 if x is smaller than all the buckets.
> UDFFindInArray - Finds the 1-index of the first element in the array given as 
> the second argument. Returns 0 if not found. Returns NULL if either argument 
> is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
> array(1,2,3)) will return 0.
> UDFGreatCircleDist - Finds the great circle distance (in km) between two 
> lat/long coordinates (in degrees).
> UDFLDA - Performs LDA inference on a vector given fixed topics.
> UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
> whenever any of its parameters changes.
> UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
> 5.
> UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
> in an array.
> UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
> UDFWhich - Given a boolean array, return the indices which are TRUE.
> UDFJaccard
> UDAFCollect - Takes all the values associated with a row and converts it into 
> a list. Make sure to have: set hive.map.aggr = false;
> UDAFCollectMap - Like collect except that it takes tuples and generates a map.
> UDAFEntropy - Compute the entropy of a column.
> UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
> columns.
> UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
> of VAL.
> UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
> with the N (passed as the third parameter) largest values of VAL.
> UDAFHistogram

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1130) Create argmin and argmax

2010-08-31 Thread Pierre Huyn (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904650#action_12904650
 ] 

Pierre Huyn commented on HIVE-1130:
---

Not sure why the rest of my message was cut off. It is still in my sent mail. 
Here is the entire message:

Hi John,

It appears that the call ObjectInspectorUtils.compare(myagg.max, xInputOI, px, 
xInputOI), which tries to compare 2 integer writables, failed in casting. Here 
is a fragment of the stack trace:

...
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:37)
at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.get(LazyIntObjectInspector.java:38)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:497)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFArgMax$GenericUDAFArgMaxEvaluator.internalMerge(GenericUDAFArgMax.java:165)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFArgMax$GenericUDAFArgMaxEvaluator.iterate(GenericUDAFArgMax.java:142)
...

Not sure why the casting fails. Both myagg.max and px were passed in as 
arguments of iterate(). Their data types are not known at compile time. That is 
the only reason why I use ObjectInspectorUtils.compare() to compare them. Do 
you see any problem with that?

--- Pierre





> Create argmin and argmax
> 
>
> Key: HIVE-1130
> URL: https://issues.apache.org/jira/browse/HIVE-1130
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Zheng Shao
>Assignee: Pierre Huyn
> Fix For: 0.7.0
>
> Attachments: HIVE-1130.1.patch, HIVE-1130.2.patch
>
>
> With HIVE-1128, users can already do what argmax and argmin does.
> But it will be helpful if we provide these functions explicitly so people 
> from maths/stats background can use it more easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Attachment: HIVE-1598.patch

This patch only add support for using SequenceFile as query result. There are 
still questions on whether we should use it for script operator or not. Will 
open another JIRA if needed.

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1598) use SequenceFile rather than TextFile format for hive query results

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1598:
-

Status: Patch Available  (was: Open)

all 0.17 & 0.20 tests passed.

> use SequenceFile rather than TextFile format for hive query results
> ---
>
> Key: HIVE-1598
> URL: https://issues.apache.org/jira/browse/HIVE-1598
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1598.patch
>
>
> Hive query's result is written to a temporary directory first and then 
> FetchTask takes the files and display it to the users. Currently the file 
> format used for the resulting file is TextFile format. This could cause 
> incorrect result display if some string typed column contains new lines, 
> which are used as record delimiters in TextInputFormat. Switching to 
> SequenceFile format will solve this problem. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1607) Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675

2010-08-31 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1607:
-

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Carl!

> Reinstate and deprecate IMetaStoreClient methods removed in HIVE-675
> 
>
> Key: HIVE-1607
> URL: https://issues.apache.org/jira/browse/HIVE-1607
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
> Attachments: HIVE-1607.1.patch.txt, HIVE-1607.2.patch.txt
>
>
> Several methods were removed from the IMetaStoreClient interface as part of 
> HIVE-675:
> {code}
>   /**
>* Drop the table.
>*
>* @param tableName
>*  The table to drop
>* @param deleteData
>*  Should we delete the underlying data
>* @throws MetaException
>*   Could not drop table properly.
>* @throws UnknownTableException
>*   The table wasn't found.
>* @throws TException
>*   A thrift communication error occurred
>* @throws NoSuchObjectException
>*   The table wasn't found.
>*/
>   public void dropTable(String tableName, boolean deleteData)
>   throws MetaException, UnknownTableException, TException,
>   NoSuchObjectException;
>   /**
>* Get a table object.
>*
>* @param tableName
>*  Name of the table to fetch.
>* @return An object representing the table.
>* @throws MetaException
>*   Could not fetch the table
>* @throws TException
>*   A thrift communication error occurred
>* @throws NoSuchObjectException
>*   In case the table wasn't found.
>*/
>   public Table getTable(String tableName) throws MetaException, TException,
>   NoSuchObjectException;
>   public boolean tableExists(String databaseName, String tableName) throws 
> MetaException,
>   TException, UnknownDBException;
> {code}
> These methods should be reinstated with a deprecation warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dropping support for pre-0.20 Hadoop versions

2010-08-31 Thread S. Venkatesh
I think this is a huge benefit for all of us. Looking forward to it.
Any time line you have in mind?

Thanks,
Venkatesh

On Tue, Aug 31, 2010 at 12:33 AM, John Sichi  wrote:
> As Carl mentioned below, there was agreement at the last Hive contributor
> meeting that we should drop support for pre-0.20 Hadoop versions in Hive
> trunk.  This means that starting with the Hive 0.7 release, Hadoop 0.20 or
> later will be required.  Anyone stuck on an earlier Hadoop version will need
> to remain on Hive 0.6 and backport any patches they need from trunk.
> There are two major benefits to this:
> * we can finally move from mapred to mapreduce API's across all of Hive
> * we'll enjoy a significant reduction in code maintenance and testing
> overhead (not to mention commit latency) for Hive contributors and
> committers
> Note that although we'll delete the pre-0.20 shim implementations, we will
> still keep the generic shim mechanism itself in place so that we can
> continue to support multiple Hadoop API versions as new ones are released in
> the future.
> For those who were not present at the contributor meeting, please speak up
> if you have an opinion on this.
> JVS
> On Aug 28, 2010, at 2:59 AM, Carl Steinbach wrote:
>
> August 8th, 2010
>
> Yongqiang He gave a presentation about his work on index support in Hive.
>
> Slides are available here: http://files.meetup.com/1658206/Hive%20Index.pptx
>
> John Sichi talked about his work on filter-pushdown optimizations. This is
> applicable to the HBase storage handler and the new index infrastructure.
> Pradeep Kamath gave an update on progress with Howl.
>
> The Howl source code is available
> on GitHub here: http://github.com/yahoo/howl
> Starting to work on security for Howl. For the first iteration the plan is
> to base it on DFS permissions.
>
> General agreement that we should aim to desupport pre-0.20.0 versions of
> Hadoop in Hive 0.7.0. This will allow us to remove the shim layer and will
> make it easier to transition to the new mapreduce APIs. But we also want to
> get a better idea of how many users are stuck on pre-0.20 versions of
> Hadoop.
> Remove Thrift generated code from repository.
>
> Pro: reduce noise in diffs during reviews.
> Con: requires developers to install Thrift compiler.
>
> Discussed moving the documentation from the wiki to version control.
>
> Probably not practical to maintain the trunk version of the docs on the wiki
> and roll over to version control at release time, so trunk version of docs
> will be maintained in vcs.
> It was agreed that feature patches should include updates to the docs, but
> it is also acceptable to file a doc ticket if there is time pressure to
> commit.j
> Will maintain an errata page on the wiki for collecting updates/corrections
> from users. These notes will be rolled into the documentation in vcs on a
> monthly basis.
>
> The next meeting will be held in September at Cloudera's office in Palo
> Alto.
>



-- 
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to
add, but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry