Re: Hive support to cassandra

2010-06-16 Thread tom kersnick
You are not being rude Jeff.  This is a request from the client due to ease
of use of Cassandra compared to Hbase.  I'm with you on this.  They are
looking for apples to apples consistency.  Easy migration of data from OLTP
(Cassandra) to their Data Warehouse (Cassandra?).  Apparently not.  Is it
possible to migrate from Cassandra to Hbase?  Any documentation on this type
of push to Hbase from Cassandra would be helpful.

Thanks in advance.

/tom





On Wed, Jun 16, 2010 at 5:44 PM, Jeff Hammerbacher wrote:

> Hey Tom,
>
> I don't want to be rude, but if you're using Cassandra for your data
> warehouse environment, you're doing it wrong. HBase is the primary focus
> for
> integration with Hive (see
> http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/,
> example). Cassandra is a great choice for an OLTP application, but
> certainly
> not for a data warehouse.
>
> Later,
> Jeff
>
> On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick  wrote:
>
> > Quick question for all of you.  Its seems that there is more movement
> using
> > Hive with Hbase rather than Cassandra.  Do you see this changing in the
> > near
> > future?  I have a client who is interested in using Cassandra due to the
> > ease of maintenance.  They are planning on using Cassandra for both their
> > data warehouse and OLTP environments.  Thoughts?
> >
> > I saw this ticket and I wanted to ask.
> >
> > Thanks in advance.
> >
> > /tom
> >
> >
> > On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo  > >wrote:
> >
> > > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> > wrote:
> > >
> > > > > All,
> > > > >
> > > > > http://code.google.com/soc/.
> > > > >
> > > > > It is an interesting thing that Google offers stipends to get open
> > > source
> > > > > code written. However, last year I was was interested in a project
> > that
> > > > did
> > > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > > accepted/rejected.
> > > > >
> > > > > Money does make the world go around, and if we all had plenty of
> > money
> > > we
> > > > > would all have more time to write open source code :) But on the
> > chance
> > > > > your
> > > > > application does get rejected consider doing it anyway!
> > > > >
> > > > > Edward
> > > > >
> > > >
> > > > Definitely Edward, Thanks for the suggestion :)
> > > >
> > > > shirish
> > > >
> > >
> > > I did not see any cassandra or hive SOC projects
> > > http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> > :(
> > > So
> > > if no one is going to pick this cassandra interface up I will pick it
> up
> > > after I close some pending things that is two strikes for me and
> > GSOC.
> > >
> >
>


[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-16 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: wtf.png

Cool on adding the logoBut something went wrong here. Unless I applied the 
patch from the left table looks wrong now. check the screen shot.

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, 
> wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879648#action_12879648
 ] 

John Sichi commented on HIVE-1364:
--

I'm not sure we need to use the deprecation approach.  In Java land, it's all 
just String regardless of the underlying character precision in the DB.

For existing metastores, people are already going to need to run upgrade SQL 
commands against their metastore DB's when upgrading to 0.6 because of the new 
support for views.  We can just add on to those scripts.


> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1364:
-

Status: Patch Available  (was: Open)

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-16 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879643#action_12879643
 ] 

Carl Steinbach commented on HIVE-1364:
--

@Prasad: It's possible that people who ran into problems were before were using 
a version of MySQL older than 5.0.3. These versions supported a 255 byte max 
length for VARCHARs. It's also possible that older versions of the package.jdo 
mapping contained more indexes, in which case the 767 byte limit holds. Also, 
UTF encoding should not make a difference since these are byte lengths, not 
character lengths.

@John: I think using LOBs is the right approach, but perhaps we should handle 
that problem in a different ticket? I don't think we can just change the 
mapping to use LOB instead of VARCHAR, and will instead have to add a new LOB 
column, deprecate the old VARCHAR column, and create an accessor that is 
capable of using either column.


> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1364:
-

Attachment: HIVE-1364.2.patch.txt

HIVE-1364.2.patch.txt:
* Change PARTITIONS.PART_NAME max length back to 767


> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.2.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879635#action_12879635
 ] 

John Sichi commented on HIVE-1364:
--

Also, PART_NAME in table PARTITIONS needs to remain as is, since it is covered 
by an index.


> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0
>
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Hive support to cassandra

2010-06-16 Thread John Sichi
Just to clarify (since I mentioned HyperTable and Cassandra in that blog post), 
Facebook's own integration efforts are currently going into Hive+HBase alone, 
but for the Hive project as a whole, we'd be happy to see storage handlers 
beyond HBase.  Someone from HyperTable has been working on one and asking 
questions on hive-user.  At talks I have given, a number of people have 
expressed interest in Cassandra, but so far I haven't seen anyone take 
ownership on that after the GSoC project was a no-go.

Each technology has its own pros and cons, which I'll stay out of here, but I 
will say that I believe Hive can be useful as scaleout data 
integration/transformation technology even for stores which are unsuited for 
data warehousing.

JVS


From: Jeff Hammerbacher [ham...@cloudera.com]
Sent: Wednesday, June 16, 2010 5:44 PM
To: hive-dev@hadoop.apache.org
Subject: Re: Hive support to cassandra

Hey Tom,

I don't want to be rude, but if you're using Cassandra for your data
warehouse environment, you're doing it wrong. HBase is the primary focus for
integration with Hive (see
http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/, for
example). Cassandra is a great choice for an OLTP application, but certainly
not for a data warehouse.

Later,
Jeff

On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick  wrote:

> Quick question for all of you.  Its seems that there is more movement using
> Hive with Hbase rather than Cassandra.  Do you see this changing in the
> near
> future?  I have a client who is interested in using Cassandra due to the
> ease of maintenance.  They are planning on using Cassandra for both their
> data warehouse and OLTP environments.  Thoughts?
>
> I saw this ticket and I wanted to ask.
>
> Thanks in advance.
>
> /tom
>
>
> On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo  >wrote:
>
> > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> wrote:
> >
> > > > All,
> > > >
> > > > http://code.google.com/soc/.
> > > >
> > > > It is an interesting thing that Google offers stipends to get open
> > source
> > > > code written. However, last year I was was interested in a project
> that
> > > did
> > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > accepted/rejected.
> > > >
> > > > Money does make the world go around, and if we all had plenty of
> money
> > we
> > > > would all have more time to write open source code :) But on the
> chance
> > > > your
> > > > application does get rejected consider doing it anyway!
> > > >
> > > > Edward
> > > >
> > >
> > > Definitely Edward, Thanks for the suggestion :)
> > >
> > > shirish
> > >
> >
> > I did not see any cassandra or hive SOC projects
> > http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> :(
> > So
> > if no one is going to pick this cassandra interface up I will pick it up
> > after I close some pending things that is two strikes for me and
> GSOC.
> >
>


[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1135:
-

Summary: Use Anakia for version controlled documentation  (was: Move hive 
language manual and tutorial to version control)

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-16 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879603#action_12879603
 ] 

Carl Steinbach commented on HIVE-1135:
--

I'm +1 on committing this (either Ed's last patch, or the one I just added that 
contains three very small tweaks).


> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1135:
-

Attachment: hive-1135-4-patch.txt

* Fixed broken image links in the stylesheet.
* Moved docs/docs/ to docs/xdocs/
* Added description to ant 'docs' target.

> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive support to cassandra

2010-06-16 Thread Jeff Hammerbacher
Hey Tom,

I don't want to be rude, but if you're using Cassandra for your data
warehouse environment, you're doing it wrong. HBase is the primary focus for
integration with Hive (see
http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/, for
example). Cassandra is a great choice for an OLTP application, but certainly
not for a data warehouse.

Later,
Jeff

On Wed, Jun 16, 2010 at 3:22 PM, tom kersnick  wrote:

> Quick question for all of you.  Its seems that there is more movement using
> Hive with Hbase rather than Cassandra.  Do you see this changing in the
> near
> future?  I have a client who is interested in using Cassandra due to the
> ease of maintenance.  They are planning on using Cassandra for both their
> data warehouse and OLTP environments.  Thoughts?
>
> I saw this ticket and I wanted to ask.
>
> Thanks in advance.
>
> /tom
>
>
> On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo  >wrote:
>
> > On Thu, Apr 8, 2010 at 1:17 PM, shirish 
> wrote:
> >
> > > > All,
> > > >
> > > > http://code.google.com/soc/.
> > > >
> > > > It is an interesting thing that Google offers stipends to get open
> > source
> > > > code written. However, last year I was was interested in a project
> that
> > > did
> > > > NOT get accepted into GSOC. It was quite deflating to be not
> > > > accepted/rejected.
> > > >
> > > > Money does make the world go around, and if we all had plenty of
> money
> > we
> > > > would all have more time to write open source code :) But on the
> chance
> > > > your
> > > > application does get rejected consider doing it anyway!
> > > >
> > > > Edward
> > > >
> > >
> > > Definitely Edward, Thanks for the suggestion :)
> > >
> > > shirish
> > >
> >
> > I did not see any cassandra or hive SOC projects
> > http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010.
> :(
> > So
> > if no one is going to pick this cassandra interface up I will pick it up
> > after I close some pending things that is two strikes for me and
> GSOC.
> >
>


[jira] Commented: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-16 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879586#action_12879586
 ] 

HBase Review Board commented on HIVE-1135:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/178/
---

(Updated 2010-06-16 16:51:58.636162)


Review request for Hive Developers.


Changes
---

Updated diff.


Summary
---

Submitted on behalf of Ed Capriolo.


This addresses bug hive-1135.
http://issues.apache.org/jira/browse/hive-1135


Diffs (updated)
-

  trunk/build.xml 955109 
  trunk/docs/docs/index.xml PRE-CREATION 
  trunk/docs/docs/language_manual/data-manipulation-statements.xml PRE-CREATION 
  trunk/docs/docs/language_manual/working_with_bucketed_tables.xml PRE-CREATION 
  trunk/docs/site.css PRE-CREATION 
  trunk/docs/stylesheets/project.xml PRE-CREATION 
  trunk/docs/stylesheets/site.vsl PRE-CREATION 
  trunk/docs/velocity.properties PRE-CREATION 

Diff: http://review.hbase.org/r/178/diff


Testing
---


Thanks,

Carl




> Move hive language manual and tutorial to version control
> -
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Review Request: HIVE-1135: Move hive language manual from wiki to SVN

2010-06-16 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/178/
---

(Updated 2010-06-16 16:51:58.636162)


Review request for Hive Developers.


Changes
---

Updated diff.


Summary
---

Submitted on behalf of Ed Capriolo.


This addresses bug hive-1135.
http://issues.apache.org/jira/browse/hive-1135


Diffs (updated)
-

  trunk/build.xml 955109 
  trunk/docs/docs/index.xml PRE-CREATION 
  trunk/docs/docs/language_manual/data-manipulation-statements.xml PRE-CREATION 
  trunk/docs/docs/language_manual/working_with_bucketed_tables.xml PRE-CREATION 
  trunk/docs/site.css PRE-CREATION 
  trunk/docs/stylesheets/project.xml PRE-CREATION 
  trunk/docs/stylesheets/site.vsl PRE-CREATION 
  trunk/docs/velocity.properties PRE-CREATION 

Diff: http://review.hbase.org/r/178/diff


Testing
---


Thanks,

Carl



[jira] Resolved: (HIVE-806) Hive with HBase as data store to support MapReduce and direct query

2010-06-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-806.
-

Resolution: Incomplete

Marking this one incomplete.  If there's still interest in any of the material 
here, please create new JIRA issue(s) with the details.

> Hive with HBase as data store to support MapReduce and direct query
> ---
>
> Key: HIVE-806
> URL: https://issues.apache.org/jira/browse/HIVE-806
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: HBase Handler
>Reporter: Schubert Zhang
>
> Current Hive uses only HDFS as the underlayer data store, it can query and 
> analyze files in HDFS via MapReduce.
> But in some engineering cases, our data are stored/organised/indexed in HBase 
> or other data stores. This jira-issue will implement hive to use HBase as 
> data store.  And except for supporting MapReduce on HBase, we will support 
> direct query on HBase.
> This is a brother jira-issue of HIVE-705 (Let Hive can analyse hbase's 
> tables, https://issues.apache.org/jira/browse/HIVE-705). Because this 
> implementation and use cases have some differences from HIVE-705, this 
> jira-issue is created to avoid confusions. It is possible to combine the two 
> issues in the future.
> Initial developers: Kula Liao, Stephen Xie, Tao Jiang and Schubert Zhang.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive support to cassandra

2010-06-16 Thread tom kersnick
Quick question for all of you.  Its seems that there is more movement using
Hive with Hbase rather than Cassandra.  Do you see this changing in the near
future?  I have a client who is interested in using Cassandra due to the
ease of maintenance.  They are planning on using Cassandra for both their
data warehouse and OLTP environments.  Thoughts?

I saw this ticket and I wanted to ask.

Thanks in advance.

/tom


On Mon, May 3, 2010 at 12:42 PM, Edward Capriolo wrote:

> On Thu, Apr 8, 2010 at 1:17 PM, shirish  wrote:
>
> > > All,
> > >
> > > http://code.google.com/soc/.
> > >
> > > It is an interesting thing that Google offers stipends to get open
> source
> > > code written. However, last year I was was interested in a project that
> > did
> > > NOT get accepted into GSOC. It was quite deflating to be not
> > > accepted/rejected.
> > >
> > > Money does make the world go around, and if we all had plenty of money
> we
> > > would all have more time to write open source code :) But on the chance
> > > your
> > > application does get rejected consider doing it anyway!
> > >
> > > Edward
> > >
> >
> > Definitely Edward, Thanks for the suggestion :)
> >
> > shirish
> >
>
> I did not see any cassandra or hive SOC projects
> http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2010. :(
> So
> if no one is going to pick this cassandra interface up I will pick it up
> after I close some pending things that is two strikes for me and GSOC.
>


[jira] Commented: (HIVE-1229) replace dependencies on HBase deprecated API

2010-06-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879544#action_12879544
 ] 

John Sichi commented on HIVE-1229:
--

Taking a look at this one now.


> replace dependencies on HBase deprecated API
> 
>
> Key: HIVE-1229
> URL: https://issues.apache.org/jira/browse/HIVE-1229
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Basab Maulik
> Attachments: HIVE-1129.1.patch
>
>
> Some of these dependencies are on the old Hadoop mapred packages; others are 
> HBase-specific.  The former have to wait until the rest of Hive moves over to 
> the new Hadoop mapreduce package, but the HBase-specific ones don't have to 
> wait.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1383) allow HBase WAL to be disabled

2010-06-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1383:
-

Status: Patch Available  (was: Open)

Ning, can you take a look at this one?


> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch, 
> HIVE-1383.4.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1383) allow HBase WAL to be disabled

2010-06-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1383:
-

Attachment: HIVE-1383.4.patch

New patch with HiveConf changes.

> allow HBase WAL to be disabled
> --
>
> Key: HIVE-1383
> URL: https://issues.apache.org/jira/browse/HIVE-1383
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1383.1.patch, HIVE-1383.2.patch, HIVE-1383.3.patch, 
> HIVE-1383.4.patch
>
>
> Disabling WAL can lead to much better INSERT performance in cases where other 
> means of safe recovery (such as bulk import) are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1255) Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1255:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks Edward!


> Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1255) Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1255:
-

Summary: Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan 
 (was: Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan)

> Add mathematical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879456#action_12879456
 ] 

John Sichi commented on HIVE-1255:
--

+1.  Will commit if tests pass.


> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Hive-trunk-h0.20 #296

2010-06-16 Thread Apache Hudson Server
See 




[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879442#action_12879442
 ] 

Ning Zhang commented on HIVE-1139:
--

Sounds good. 

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
> Attachments: PersistentMap.zip
>
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879441#action_12879441
 ] 

Arvind Prabhakar commented on HIVE-1176:


bq. Can you elaborate on what you mean by 'some collections were being fetched 
as semi-populated proxies with missing session context leading to NPEs'? Is 
there something I can do to reproduce this?

@Paul: Here are the steps to reproduce this problem:

# Startout with a clean workspace checkout and apply the updated patch 
HIVE-1176-2.patch. 
# Manually revert the file 
{{metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java}}
 to its previous state
# run {{ant package}} from the root of the workspace
# run {{ant test}} from within metastore

You should see failures like the following:
{code}
[junit] testPartition() failed.
[junit] java.lang.NullPointerException
[junit] at 
org.datanucleus.store.mapped.scostore.AbstractMapStore.validateKeyForWriting(AbstractMapStore.java:333)
[junit] at 
org.datanucleus.store.mapped.scostore.JoinMapStore.put(JoinMapStore.java:252)
[junit] at org.datanucleus.sco.backed.Map.put(Map.java:640)
[junit] at 
org.apache.hadoop.hive.metastore.api.Table.putToParameters(Table.java:359)
[junit] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table(HiveMetaStore.java:1281)
[junit] at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:140)
[junit] at 
org.apache.hadoop.hive.metastore.TestHiveMetaStore.testAlterTable(TestHiveMetaStore.java:728)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
{code}

If you look at 
{{src/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java}} you would 
notice that the line causing this exception should ideally be a {{HashMap}} and 
not an {{org.datanucleus.store.mapped.scostore.AbstractMapStore}} as indicated 
by the stack trace. This happens because the datanucleus JDO framework replaces 
collections with its own implementation in order to allow lazy-dereferencing 
and optimize for database connections/queries/memory consumption etc.

Lazy loading of collections (and second class objects in general) can be 
disabled at a global level or at entity level. Disabling this globally is 
generally not recommended unless there is evidence backed by extensive testing 
that supports that change. Disabling at an entity level is still OK provided 
the entity object graph is fully dereferenced at all times. This could lead to 
extensive memory consumption in the system in case the entity graph is huge. 

My approach towards fixing the problem was to *not* change the default behavior 
in the general case. Instead I felt that it was better to circumvent this 
problem in the case of a remote metastore by creating a copy explicitly. If you 
have other suggestions on how to address this, please let me know.

Also - more information on the lazy dereferencing mechanism used by datanucleus 
framework can be found [here|http://www.datanucleus.org/plugins/core/sco.html].


> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> 

[jira] Commented: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879432#action_12879432
 ] 

Arvind Prabhakar commented on HIVE-1176:


The updated patch HIVE-1176-2.patch contains the following changes:

#   modified:   build.properties
#   modified:   build.xml
#   modified:   eclipse-templates/.classpath
#   modified:   ivy/ivysettings.xml
#   deleted:lib/datanucleus-core-1.1.2.LICENSE
#   deleted:lib/datanucleus-core-1.1.2.jar
#   deleted:lib/datanucleus-enhancer-1.1.2.LICENSE
#   deleted:lib/datanucleus-enhancer-1.1.2.jar
#   deleted:lib/datanucleus-rdbms-1.1.2.LICENSE
#   deleted:lib/datanucleus-rdbms-1.1.2.jar
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.LICENSE
#   deleted:lib/jdo2-api-2.3-SNAPSHOT.jar
#   modified:   metastore/ivy.xml
#   modified:   
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-06-16 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-1176:
---

Attachment: HIVE-1176-2.patch

Updating the patch with latest trunk image. This is necessary since HIVE-1373 
updated the eclipse classpath with connection pool libraries which will be 
outdated with the application of this patch. The updated version of the patch 
takes care of this problem by updating eclipse classpath to use the updated 
libraries instead. Tested out launch configuration via eclipse to make sure it 
is working.

> 'create if not exists' fails for a table name with 'select' in it
> -
>
> Key: HIVE-1176
> URL: https://issues.apache.org/jira/browse/HIVE-1176
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Arvind Prabhakar
> Fix For: 0.6.0
>
> Attachments: HIVE-1176-1.patch, HIVE-1176-2.patch, 
> HIVE-1176.lib-files.tar.gz, HIVE-1176.patch
>
>
> hive> create table if not exists tmp_select(s string, c string, n int);
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
> start with SELECT)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
> JDOQL Single-String query should always start with SELECT)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
> ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-06-16 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879423#action_12879423
 ] 

Paul Yang commented on HIVE-1255:
-

Looks good +1

> Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
> --
>
> Key: HIVE-1255
> URL: https://issues.apache.org/jira/browse/HIVE-1255
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1255-patch-2.txt, hive-1255-patch-3.txt, 
> hive-1255-patch-4.txt, hive-1255-patch.txt
>
>
> Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

2010-06-16 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879412#action_12879412
 ] 

Joydeep Sen Sarma commented on HIVE-1408:
-

this is somewhat more complicated than i had bargained for:

- we choose local/hdfs files at query compile time based on local mode setting. 
however we won't choose local mode until query compilation is complete
- we choose whether to submit job via child jvm before the point at which the 
pre-hook is called. we (currently) have to submit job via child jvm for local 
mode
- hooks don't have access to map-reduce plans and whether there are any script 
operators (for instance).

so it's not possible to implement this via hooks. (and the changes required are 
somewhat invasive)

> add option to let hive automatically run in local mode based on tunable 
> heuristics
> --
>
> Key: HIVE-1408
> URL: https://issues.apache.org/jira/browse/HIVE-1408
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> as a followup to HIVE-543 - we should have a simple option (enabled by 
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
>  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
> if data > 1G
>  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to 
> provide this as a standard hook in the hive codebase since it's likely to 
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per 
> hive-task (ie. hadoop job) level. per job-level requires more changes to 
> compilation (to not pre-commit to hdfs or local scratch directories at 
> compile time).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys

2010-06-16 Thread Soundararajan Velu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879406#action_12879406
 ] 

Soundararajan Velu commented on HIVE-1139:
--

Thanks Ning, sounds logical, will try with 0.15 and tune accordingly in our 
environment, but on a long run I guess we may need a strong reflection based 
serde Map. I am still exploring if it can be achieved.. will keep the progress 
posted.

> GroupByOperator sometimes throws OutOfMemory error when there are too many 
> distinct keys
> 
>
> Key: HIVE-1139
> URL: https://issues.apache.org/jira/browse/HIVE-1139
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
>Reporter: Ning Zhang
>Assignee: Arvind Prabhakar
> Attachments: PersistentMap.zip
>
>
> When a partial aggregation performed on a mapper, a HashMap is created to 
> keep all distinct keys in main memory. This could leads to OOM exception when 
> there are too many distinct keys for a particular mapper. A workaround is to 
> set the map split size smaller so that each mapper takes less number of rows. 
> A better solution is to use the persistent HashMapWrapper (currently used in 
> CommonJoinOperator) to spill overflow rows to disk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-16 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Status: Patch Available  (was: Open)

Submitting the regenerated patch with lastest trunk image. Patch file is 
HIVE-287-3.patch.

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-287) count distinct on multiple columns does not work

2010-06-16 Thread Arvind Prabhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated HIVE-287:
--

Attachment: HIVE-287-3.patch

> count distinct on multiple columns does not work
> 
>
> Key: HIVE-287
> URL: https://issues.apache.org/jira/browse/HIVE-287
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Arvind Prabhakar
> Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch
>
>
> The following query does not work:
> select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #473

2010-06-16 Thread Apache Hudson Server
See 

Changes:

[namit] HIVE-1179. Add UDF array_contains
(Arvind Prabhakar via namit)

[nzhang] HIVE-1410. Add TCP keepalive option for metastore server

[jvs] HIVE-1397. histogram_numeric UDAF
(Mayank Lahiri via jvs)

[namit] HIVE-1409. Use the tableSpec if partitions is not present
(Paul Yang via namit)

--
[...truncated 7070 lines...]
[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: no_hooks.q
[junit] Begin query: noalias_subq1.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: noalias_subq1.q
[junit] Begin query: notable_alias1.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: notable_alias1.q
[junit] Begin query: notable_alias2.q
[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: notable_alias2.q
[junit] Begin query: null_column.q
[junit] plan = 

[junit] plan = 

[junit] plan = 

[junit] plan = 

[junit] plan = 

[junit] diff -a -I file: -I /tmp/ -I invalidscheme: -I lastUpdateTime -I 
lastAccessTime -I owner -I transient_lastDdlTime -I java.lang.RuntimeException 
-I at org -I at sun -I at java -I at junit -I Caused by: -I [.][.][.] [0-9]* 
more 

 

[junit] Done query: null_column.q
[junit] Begin query: nullgroup.q
[junit] plan = 

[junit] plan = 


Re: Vertical partitioning

2010-06-16 Thread Ning Zhang
Hive support columnar storage (RCFile) but not vertical partitioning. Is there 
any use case for vertical partitioning?

On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:

> Hi,
> 
> Does hive support Vertical partitioning?
> 
> Regards,
> Jaydeep
> 
> 
> 
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others authorized 
> to receive it. It may contain confidential or legally privileged information. 
> If you are not the intended recipient you are hereby notified that any 
> disclosure, copying, distribution or taking any action in reliance on the 
> contents of this information is strictly prohibited and may be unlawful. If 
> you have received this communication in error, please notify us immediately 
> by responding to this email and then delete it from your system. The firm is 
> neither liable for the proper and complete transmission of the information 
> contained in this communication nor for any delay in its receipt.



Vertical partitioning

2010-06-16 Thread jaydeep vishwakarma

Hi,

Does hive support Vertical partitioning?

Regards,
Jaydeep



The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. It may contain confidential or legally privileged information. If 
you are not the intended recipient you are hereby notified that any disclosure, 
copying, distribution or taking any action in reliance on the contents of this 
information is strictly prohibited and may be unlawful. If you have received 
this communication in error, please notify us immediately by responding to this 
email and then delete it from your system. The firm is neither liable for the 
proper and complete transmission of the information contained in this 
communication nor for any delay in its receipt.