Re: [VOTE] hive release candidate 0.4.1-rc0
I think there may be a bug still in this release. hiveselect stuff_status from auctions where auction_id='2591238417' and pt='20091027'; auctions is a table partitioned by date, it stored as a textfile w/o compression. The query above should return 0 rows. but when hive.exec.compress.output=true, hive will crash with a StackOverflowError java.lang.StackOverflowError at java.lang.ref.FinalReference.init(FinalReference.java:16) at java.lang.ref.Finalizer.init(Finalizer.java:66) at java.lang.ref.Finalizer.register(Finalizer.java:72) at java.lang.Object.init(Object.java:20) at java.net.SocketImpl.init(SocketImpl.java:27) at java.net.PlainSocketImpl.init(PlainSocketImpl.java:90) at java.net.SocksSocketImpl.init(SocksSocketImpl.java:33) at java.net.Socket.setImpl(Socket.java:434) at java.net.Socket.init(Socket.java:68) at sun.nio.ch.SocketAdaptor.init(SocketAdaptor.java:50) at sun.nio.ch.SocketAdaptor.create(SocketAdaptor.java:55) at sun.nio.ch.SocketChannelImpl.socket(SocketChannelImpl.java:105) at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1540) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1662) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:96) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:86) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.backfill(LineReader.java:82) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:112) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:256) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:272) Each mapper will produce a 8 bytes deflate file on hdfs(we set hive.merge.mapfiles=false), their hex representation is like below: 78 9C 03 00 00 00 00 01 This is the reason why FetchOperator:272 is called recursively, and caused a stack overflow error. Regards, Min On Mon, Nov 2, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote: I have made a release candidate 0.4.1-rc0. We've fixed several critical bugs to hive release 0.4.0. We need hive release 0.4.1 out asap. Here are the list of changes: HIVE-884. Metastore Server should call System.exit() on error. (Zheng Shao via pchakka) HIVE-864. Fix map-join memory-leak. (Namit Jain via zshao) HIVE-878. Update the hash table entry before flushing in Group By hash aggregation (Zheng Shao via namit) HIVE-882. Create a new directory every time for scratch. (Namit Jain via zshao) HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff via zshao) HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur via zshao) HIVE-883. URISyntaxException when partition value contains special chars. (Zheng Shao via namit) Please vote. -- Yours, Zheng -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
Re: [VOTE] hive release candidate 0.4.1-rc0
we use zip codec in default. Some of the same lines were omitted from the error stack: at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:272) Thanks, Min On Mon, Nov 2, 2009 at 2:57 PM, Zheng Shao zsh...@gmail.com wrote: Min, can you check the default compression codec in your hadoop conf? The 8-byte file must be a compressed file using the codec which represents 0-length file. It seems that codec was not able to decompress the stream. Zheng On Sun, Nov 1, 2009 at 10:49 PM, Min Zhou coderp...@gmail.com wrote: I think there may be a bug still in this release. hiveselect stuff_status from auctions where auction_id='2591238417' and pt='20091027'; auctions is a table partitioned by date, it stored as a textfile w/o compression. The query above should return 0 rows. but when hive.exec.compress.output=true, hive will crash with a StackOverflowError java.lang.StackOverflowError at java.lang.ref.FinalReference.init(FinalReference.java:16) at java.lang.ref.Finalizer.init(Finalizer.java:66) at java.lang.ref.Finalizer.register(Finalizer.java:72) at java.lang.Object.init(Object.java:20) at java.net.SocketImpl.init(SocketImpl.java:27) at java.net.PlainSocketImpl.init(PlainSocketImpl.java:90) at java.net.SocksSocketImpl.init(SocksSocketImpl.java:33) at java.net.Socket.setImpl(Socket.java:434) at java.net.Socket.init(Socket.java:68) at sun.nio.ch.SocketAdaptor.init(SocketAdaptor.java:50) at sun.nio.ch.SocketAdaptor.create(SocketAdaptor.java:55) at sun.nio.ch.SocketChannelImpl.socket(SocketChannelImpl.java:105) at org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1540) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1662) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:96) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:86) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.backfill(LineReader.java:82) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:112) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:256) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:272) Each mapper will produce a 8 bytes deflate file on hdfs(we set hive.merge.mapfiles=false), their hex representation is like below: 78 9C 03 00 00 00 00 01 This is the reason why FetchOperator:272 is called recursively, and caused a stack overflow error. Regards, Min On Mon, Nov 2, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote: I have made a release candidate 0.4.1-rc0. We've fixed several critical bugs to hive release 0.4.0. We need hive release 0.4.1 out asap. Here are the list of changes: HIVE-884. Metastore Server should call System.exit() on error. (Zheng Shao via pchakka) HIVE-864. Fix map-join memory-leak. (Namit Jain via zshao) HIVE-878. Update the hash table entry before flushing in Group By hash aggregation (Zheng Shao via namit) HIVE-882. Create a new directory every time for scratch. (Namit Jain via zshao) HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff via zshao) HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur via zshao) HIVE-883. URISyntaxException when partition value contains special chars. (Zheng Shao via namit) Please vote. -- Yours, Zheng -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com -- Yours, Zheng -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765362#action_12765362 ] Min Zhou commented on HIVE-842: --- @Edward Kerberos for authethication is a good way I think, user/password is no need here. This issue would be implemented in the future. btw, we've finished the development of authorization infrastructure for Hive. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] vote for release candidate for hive
I saw it, +1 for all test passed . On Wed, Sep 30, 2009 at 1:59 AM, Namit Jain nj...@facebook.com wrote: I did find the files: [nj...@dev029 /tmp]$ ls -lrt hive-0.4.0-dev-hadoop-0.19.0/src total 33580 drwxr-xr-x 4 njain users 4096 Aug 11 16:41 docs drwxr-xr-x 7 njain users 4096 Aug 11 16:41 data -rw-r--r-- 1 njain users15675 Aug 11 16:41 README.txt -rw-r--r-- 1 njain users 2810 Sep 2 10:44 TestTruncate.launch -rw-r--r-- 1 njain users 2804 Sep 2 10:44 TestMTQueries.launch -rw-r--r-- 1 njain users 2807 Sep 2 10:44 TestJdbc.launch -rw-r--r-- 1 njain users 2808 Sep 2 10:44 TestHive.launch -rw-r--r-- 1 njain users 2805 Sep 2 10:44 TestCliDriver.launch -rw-r--r-- 1 njain users17045 Sep 10 15:16 build.xml -rw-r--r-- 1 njain users 850 Sep 10 15:16 build.properties -rw-r--r-- 1 njain users12520 Sep 10 15:16 build-common.xml -rw-r--r-- 1 njain users33431 Sep 17 18:15 CHANGES.txt -rw-r--r-- 1 njain users 1071 Sep 18 13:26 runscr -rw-r--r-- 1 njain users 23392371 Sep 18 13:26 hive-0.4.0-hadoop-0.20.0-dev.tar.gz -rw-r--r-- 1 njain users 10735695 Sep 18 13:27 hive-0.4.0-hadoop-0.20.0-bin.tar.gz drwxr-xr-x 3 njain users 4096 Sep 29 10:54 jdbc drwxr-xr-x 2 njain users 4096 Sep 29 10:54 ivy drwxr-xr-x 4 njain users 4096 Sep 29 10:54 hwi drwxr-xr-x 4 njain users 4096 Sep 29 10:54 eclipse-templates drwxr-xr-x 3 njain users 4096 Sep 29 10:54 contrib drwxr-xr-x 2 njain users 4096 Sep 29 10:54 conf drwxr-xr-x 3 njain users 4096 Sep 29 10:54 common drwxr-xr-x 4 njain users 4096 Sep 29 10:54 cli drwxr-xr-x 3 njain users 4096 Sep 29 10:54 ant drwxr-xr-x 2 njain users 4096 Sep 29 10:54 testutils drwxr-xr-x 2 njain users 4096 Sep 29 10:54 testlibs drwxr-xr-x 3 njain users 4096 Sep 29 10:54 shims drwxr-xr-x 6 njain users 4096 Sep 29 10:54 service drwxr-xr-x 4 njain users 4096 Sep 29 10:54 serde drwxr-xr-x 5 njain users 4096 Sep 29 10:54 ql drwxr-xr-x 4 njain users 4096 Sep 29 10:54 odbc drwxr-xr-x 6 njain users 4096 Sep 29 10:54 metastore drwxr-xr-x 2 njain users 4096 Sep 29 10:54 lib drwxr-xr-x 3 njain users 4096 Sep 29 10:54 bin I have attached the output. -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Tuesday, September 22, 2009 6:29 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive Hi Namit I meant http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gz Min On Wed, Sep 23, 2009 at 5:31 AM, Namit Jain nj...@facebook.com wrote: Which one are you looking at ? I downloaded just now from: http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz http://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz and it contains CHANGE.txt and build.xml etc. Did you download the binary tarball ? Thanks, -namit -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Monday, September 21, 2009 7:46 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive Hi Namit, I haven't found build.xml, CHANGES.txt from your tarball. They must be included so that we can test it and check the changes, I think. Thanks, Min On Sat, Sep 19, 2009 at 4:42 AM, Namit Jain nj...@facebook.com wrote: It is available from http://people.apache.org/~namit/ http://people.apache.org/%7Enamit/ http://people.apache.org/%7Enamit/ http://people.apache.org/%7Enamit/ Thanks, -namit -Original Message- From: Ashish Thusoo Sent: Thursday, September 17, 2009 11:55 PM To: hive-dev@hadoop.apache.org; Namit Jain Subject: RE: [VOTE] vote for release candidate for hive Namit, Can you make it available from http://people.apache.org/~njain/ http://people.apache.org/%7Enjain/ http://people.apache.org/%7Enjain/ http://people.apache.org/%7Enjain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 17, 2009 6:32 PM To: Namit Jain; hive-dev@hadoop.apache.org Subject: [VOTE] vote for release candidate for hive Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop
we can not pass unit test of trunk.
Hi all, below is a failure : [junit] Begin query: input41.q [junit] plan = /tmp/plan37765.xml [junit] plan = /tmp/plan37766.xml [junit] plan = /tmp/plan37767.xml [junit] plan = /tmp/plan37768.xml [junit] diff -a -I \(file:\)\|\(/tmp/.*\) -I lastUpdateTime -I lastAccessTime -I owner /home/hivetest/hive-trunk/build/ql/test/l ogs/clientpositive/input41.q.out /home/hivetest/hive-trunk/ql/src/test/results/clientpositive/input41.q.out [junit] 7,8c7 [junit] Output: file:/home/hivetest/hive-trunk/build/ql/tmp/1868499757/1 [junit] 0 [junit] --- [junit] Output: file:/data/users/njain/hive1/hive1/build/ql/tmp/607183026/1 [junit] 9a9 [junit] 0 [junit] Exception: Client execution results failed with error code = 1 [junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input41(TestCliDriver.java:5010) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567) $cat test-0.19.0.log | grep Failures [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.297 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.637 sec [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.777 sec [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.446 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.323 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.422 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.308 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.364 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.354 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.363 sec [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.389 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.379 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.321 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 13.234 sec [junit] Tests run: 338, Failures: 3, Errors: 0, Time elapsed: 3,955.436 sec [junit] Tests run: 75, Failures: 0, Errors: 0, Time elapsed: 208.786 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 45.511 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 36.822 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 14.556 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.721 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.117 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.42 sec [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.038 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.055 sec [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.869 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.706 sec [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.301 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.338 sec [junit] Tests run: 44, Failures: 0, Errors: 0, Time elapsed: 155.071 sec [junit] Tests run: 33, Failures: 0, Errors: 0, Time elapsed: 108.604 sec [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.588 sec we runned test with command ant test -Dhadoop.version=0.19.0 Thanks, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
Re: [VOTE] vote for release candidate for hive
Hi Namit I meant http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gz Min On Wed, Sep 23, 2009 at 5:31 AM, Namit Jain nj...@facebook.com wrote: Which one are you looking at ? I downloaded just now from: http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz and it contains CHANGE.txt and build.xml etc. Did you download the binary tarball ? Thanks, -namit -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Monday, September 21, 2009 7:46 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive Hi Namit, I haven't found build.xml, CHANGES.txt from your tarball. They must be included so that we can test it and check the changes, I think. Thanks, Min On Sat, Sep 19, 2009 at 4:42 AM, Namit Jain nj...@facebook.com wrote: It is available from http://people.apache.org/~namit/ http://people.apache.org/%7Enamit/ http://people.apache.org/%7Enamit/ Thanks, -namit -Original Message- From: Ashish Thusoo Sent: Thursday, September 17, 2009 11:55 PM To: hive-dev@hadoop.apache.org; Namit Jain Subject: RE: [VOTE] vote for release candidate for hive Namit, Can you make it available from http://people.apache.org/~njain/ http://people.apache.org/%7Enjain/ http://people.apache.org/%7Enjain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 17, 2009 6:32 PM To: Namit Jain; hive-dev@hadoop.apache.org Subject: [VOTE] vote for release candidate for hive Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate ( https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
Re: we can not pass unit test of trunk.
We now use ant test -Dhadoop.version=0.19.0 -Doverwrite=true, it passed. Can anyone give me an explanation? Thanks, Min On Wed, Sep 23, 2009 at 9:23 AM, Min Zhou coderp...@gmail.com wrote: Hi all, below is a failure : [junit] Begin query: input41.q [junit] plan = /tmp/plan37765.xml [junit] plan = /tmp/plan37766.xml [junit] plan = /tmp/plan37767.xml [junit] plan = /tmp/plan37768.xml [junit] diff -a -I \(file:\)\|\(/tmp/.*\) -I lastUpdateTime -I lastAccessTime -I owner /home/hivetest/hive-trunk/build/ql/test/l ogs/clientpositive/input41.q.out /home/hivetest/hive-trunk/ql/src/test/results/clientpositive/input41.q.out [junit] 7,8c7 [junit] Output: file:/home/hivetest/hive-trunk/build/ql/tmp/1868499757/1 [junit] 0 [junit] --- [junit] Output: file:/data/users/njain/hive1/hive1/build/ql/tmp/607183026/1 [junit] 9a9 [junit] 0 [junit] Exception: Client execution results failed with error code = 1 [junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input41(TestCliDriver.java:5010) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567) $cat test-0.19.0.log | grep Failures [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.297 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.637 sec [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.777 sec [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.446 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.323 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.422 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.308 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.364 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.354 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.363 sec [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.389 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.379 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.321 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 13.234 sec [junit] Tests run: 338, Failures: 3, Errors: 0, Time elapsed: 3,955.436 sec [junit] Tests run: 75, Failures: 0, Errors: 0, Time elapsed: 208.786 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 45.511 sec [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 36.822 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 14.556 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.721 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.117 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.42 sec [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.038 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.055 sec [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.869 sec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 7.706 sec [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.301 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.338 sec [junit] Tests run: 44, Failures: 0, Errors: 0, Time elapsed: 155.071 sec [junit] Tests run: 33, Failures: 0, Errors: 0, Time elapsed: 108.604 sec [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.588 sec we runned test with command ant test
Re: [VOTE] vote for release candidate for hive
Hi Namit, I haven't found build.xml, CHANGES.txt from your tarball. They must be included so that we can test it and check the changes, I think. Thanks, Min On Sat, Sep 19, 2009 at 4:42 AM, Namit Jain nj...@facebook.com wrote: It is available from http://people.apache.org/~namit/ http://people.apache.org/%7Enamit/ Thanks, -namit -Original Message- From: Ashish Thusoo Sent: Thursday, September 17, 2009 11:55 PM To: hive-dev@hadoop.apache.org; Namit Jain Subject: RE: [VOTE] vote for release candidate for hive Namit, Can you make it available from http://people.apache.org/~njain/ http://people.apache.org/%7Enjain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 17, 2009 6:32 PM To: Namit Jain; hive-dev@hadoop.apache.org Subject: [VOTE] vote for release candidate for hive Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate ( https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Commented: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758112#action_12758112 ] Min Zhou commented on HIVE-78: -- @Namit Got your meaning. We are maintaining a version of our own, it needs couples of weeks for adapting to the trunk. Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757616#action_12757616 ] Min Zhou commented on HIVE-78: -- sorry, {nofromat} public class GenericAuthenticator extends Authenticator { public GenericAuthenticator (Hive db, User user); ... } {nofromat} Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757622#action_12757622 ] Min Zhou commented on HIVE-78: -- oops, my code wasn't in my machine. I just pasted yours and modified it into mine. here is a patch show my code on that. Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-78) Authorization infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-78: - Attachment: createuser-v1.patch Authorization infrastructure for Hive - Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756904#action_12756904 ] Min Zhou commented on HIVE-78: -- Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user like mysqld does. Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756949#action_12756949 ] Min Zhou commented on HIVE-78: -- I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one. Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756951#action_12756951 ] Min Zhou commented on HIVE-78: -- From the words you commented: {noformat} Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? {noformat} Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-78: - Attachment: hive-78-metadata-v1.patch Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756335#action_12756335 ] Min Zhou commented on HIVE-78: -- @Edward Sorry for my abuse of some words, I hope this will not affect our work. Can you give me the jiras you decided not to store username/password information in hive and hadoop will? I think most companies are using hadoop versions from 0.17 to 0.20 , which don't have good password securities. Once a company takes a particular version, upgrades for them is a very important issue, many companies will adopt a more stable version. Moreover, now hadoop still do not have that feature, which may cost a very long time to implement. Why should we are waiting for, rather than accomplish it? I think Hive is necessary to support user/password at least for current versions of hadoop. There are many companies who are using hive reflected that current hive is inconvenient for multi-user, as long as environment isolation, table sharing, security, etc. We must try to meet the requirements of most of them. Regarding the syntax, I guess we can do it in two steps. # support GRANT/REVOKE privileges to users. # support some sort of server administration privileges as Ashish metioned. The GRANT statement enables system administrators to create Hive user accounts and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION privilege, and you must have the privileges that you are grantingad. The REVOKE statement is related and enables ministrators to remove account privileges. File hive-78-syntax-v1.patch modifies the syntax. Any comments on that? Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755876#action_12755876 ] Min Zhou commented on HIVE-78: -- we will take over this issue, it would be finished in two weeks. Here are the sql statements will be added: {noformat} CREATE USER, DROP USER; ALTER USER SET PASSOWRD; GRANT; REVOKE {noformat} Metadata is stored at some sort of persistent media such as mysql DBMS through jdo. We will add three tables for this issue, they are USER, DBS_PRIV, TABLES_PRIV. Privileges can be granted at several levels, each table above are corresponding to a privilege level. # Global level Global privileges apply to all databases on a given server. These privileges are stored in the USER table. GRANT ALL ON *.* and REVOKE ALL ON *.* grant and revoke only global privileges. GRANT ALL ON *.* TO 'someuser'; GRANT SELECT, INSERT ON *.* TO 'someuser'; # Database level Database privileges apply to all objects in a given database. These privileges are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON db_name.* grant and revoke only database privileges. GRANT ALL ON mydb.* TO 'someuser'; GRANT SELECT, INSERT ON mydb.* TO 'someuser'; Although we can't create DBs currently, it would take a reserved place till hive support. # Table level Table privileges apply to all columns in a given table. These privileges are stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL ON db_name.tbl_name grant and revoke only table privileges. GRANT ALL ON mydb.mytbl TO 'someuser'; GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser'; Hive account information is stored in USER table, includes username, password and kinds of privileges. User who has been granted any privilege to, such as select/insert/drop on a particular table, always have a right to show that table. Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-78: - Attachment: hive-78-syntax-v1.patch Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755882#action_12755882 ] Min Zhou commented on HIVE-78: -- We currently use seperated mysql dbs for achieving an isolated CLI environment, which is not practical. An authentication infrastructure is urgently needed for us. Almost all statements would be influenced, for example SELECT INSERT SHOW TABLES SHOW PARTITIONS DESCRIBE TABLE MSCK CREATE TABLE CREATE FUNCTION -- we are considering how to control people creating udfs. DROP TABLE DROP FUNCTION LOAD added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even includes some non-sql commands like set , add file ,add jar. Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-818) Create a Hive CLI that connects to hive ThriftServer
[ https://issues.apache.org/jira/browse/HIVE-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752852#action_12752852 ] Min Zhou commented on HIVE-818: --- this feature looks pretty good for us, we were looking for a CLI mode client of hive server. Create a Hive CLI that connects to hive ThriftServer Key: HIVE-818 URL: https://issues.apache.org/jira/browse/HIVE-818 Project: Hadoop Hive Issue Type: New Feature Components: Clients, Server Infrastructure Reporter: Edward Capriolo Assignee: Edward Capriolo We should have an alternate CLI that works by interacting with the HiveServer, in this way it will be ready when/if we deprecate the current CLI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-814) exception alter a int typed column to date/datetime/timestamp
exception alter a int typed column to date/datetime/timestamp - Key: HIVE-814 URL: https://issues.apache.org/jira/browse/HIVE-814 Project: Hadoop Hive Issue Type: Bug Reporter: Min Zhou As fas as i know, time types can only be used in partitions, normal columns is not allowed to be set as those types . But it's found can alter a no time types column to date/datetime/timestamp,but exceptions will be throwed when describing. hive create table pokes(foo int, bar string); OK Time taken: 0.894 seconds hive alter table pokes replace columns(foo date, bar string); OK Time taken: 0.266 seconds hive describe pokes; FAILED: Error in metadata: MetaException(message:java.lang.IllegalArgumentException Error: type expected at the position 0 of 'date:string' but 'date' is found.) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Branching and releasing for 0.4.0
+1 all tests pass on my machine. Min On Sat, Aug 8, 2009 at 5:57 PM, Amr Awadallah a...@cloudera.com wrote: +1 Namit Jain wrote: +1 On 8/6/09 5:16 PM, Zheng Shao zsh...@gmail.com wrote: +1 On Thu, Aug 6, 2009 at 5:10 PM, Ashish Thusooathu...@facebook.com wrote: Hi Folks, The following is a proposed schedule for 0.4.0 branching. Please vote on it by tomorrow and say whether you are ok with it or not. 8/7/2009 - Branch out 0.4.0 (branching to happen by mid night) 8/14/2009 - Code freeze on 0.4.0 8/28/2009 - Release 0.4.0 Thanks, Ashish -- Yours, Zheng -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Commented: (HIVE-607) Create statistical UDFs.
[ https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736473#action_12736473 ] Min Zhou commented on HIVE-607: --- @Namit I implemented group_cat() in a rush, and found something difficult slove: 1. function group_cat() has a internal order by clause, currently, we can't such aggregation in hive. 2. when the string will be group concated is too large, in another is appears data skew, there is ofen not enough memory to store such a big string. Create statistical UDFs. Key: HIVE-607 URL: https://issues.apache.org/jira/browse/HIVE-607 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: S. Alex Smith Assignee: Emil Ibrishimov Priority: Minor Fix For: 0.4.0 Attachments: HIVE-607.1.patch, UDAFStddev.java Create UDFs replicating: STD() Return the population standard deviation STDDEV_POP()(v5.0.3) Return the population standard deviation STDDEV_SAMP()(v5.0.3) Return the sample standard deviation STDDEV() Return the population standard deviation SUM() Return the sum VAR_POP()(v5.0.3) Return the population standard variance VAR_SAMP()(v5.0.3)Return the sample variance VARIANCE()(v4.1) Return the population standard variance as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-607) Create statistical UDFs.
[ https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736475#action_12736475 ] Min Zhou commented on HIVE-607: --- sorry, some typo @Namit I've implemented group_cat() in a rush, and found something difficult to slove: 1. function group_cat() has a internal order by clause, currently, we can't implement such an aggregation in hive. 2. when the strings will be group concated are too large, in another words, if data skew appears, there is ofen not enough memory to store such a big result. Create statistical UDFs. Key: HIVE-607 URL: https://issues.apache.org/jira/browse/HIVE-607 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: S. Alex Smith Assignee: Emil Ibrishimov Priority: Minor Fix For: 0.4.0 Attachments: HIVE-607.1.patch, UDAFStddev.java Create UDFs replicating: STD() Return the population standard deviation STDDEV_POP()(v5.0.3) Return the population standard deviation STDDEV_SAMP()(v5.0.3) Return the sample standard deviation STDDEV() Return the population standard deviation SUM() Return the sum VAR_POP()(v5.0.3) Return the population standard variance VAR_SAMP()(v5.0.3)Return the sample variance VARIANCE()(v4.1) Return the population standard variance as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions
[ https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-702: -- Attachment: HIVE-702.1.patch patch DROP TEMPORARY FUNCTION should not drop builtin functions - Key: HIVE-702 URL: https://issues.apache.org/jira/browse/HIVE-702 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Attachments: HIVE-702.1.patch Only temporary functions should be dropped. It should error out if the user tries to drop built-in functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions
[ https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736762#action_12736762 ] Min Zhou commented on HIVE-702: --- pls wait a moment, I haven't deal with the conflict you mentioned. DROP TEMPORARY FUNCTION should not drop builtin functions - Key: HIVE-702 URL: https://issues.apache.org/jira/browse/HIVE-702 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Attachments: HIVE-702.1.patch Only temporary functions should be dropped. It should error out if the user tries to drop built-in functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions
[ https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-702: -- Attachment: HIVE-702.2.patch done DROP TEMPORARY FUNCTION should not drop builtin functions - Key: HIVE-702 URL: https://issues.apache.org/jira/browse/HIVE-702 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Attachments: HIVE-702.1.patch, HIVE-702.2.patch Only temporary functions should be dropped. It should error out if the user tries to drop built-in functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions
[ https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736767#action_12736767 ] Min Zhou commented on HIVE-702: --- that patch hasn't been tested, cuz I stay at home, can not connect to the company's vpn. DROP TEMPORARY FUNCTION should not drop builtin functions - Key: HIVE-702 URL: https://issues.apache.org/jira/browse/HIVE-702 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Attachments: HIVE-702.1.patch, HIVE-702.2.patch Only temporary functions should be dropped. It should error out if the user tries to drop built-in functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-700) Fix test error by adding DROP FUNCTION
[ https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-700: - Assignee: Min Zhou Fix test error by adding DROP FUNCTION Key: HIVE-700 URL: https://issues.apache.org/jira/browse/HIVE-700 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Min Zhou Since we added Show Functions in HIVE-580, test results will depend on what temporary functions are added to the system. We should add the capability of DROP FUNCTION, and do that at the end of those create function tests to make sure the show functions results are deterministic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-649) [UDF] now() for getting current time
[ https://issues.apache.org/jira/browse/HIVE-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-649: -- Attachment: HIVE-649.patch patch [UDF] now() for getting current time Key: HIVE-649 URL: https://issues.apache.org/jira/browse/HIVE-649 Project: Hadoop Hive Issue Type: New Feature Reporter: Min Zhou Attachments: HIVE-649.patch http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-700) Fix test error by adding DROP FUNCTION
[ https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-700: -- Attachment: HIVE-700.1.patch usage: drop function function_name Fix test error by adding DROP FUNCTION Key: HIVE-700 URL: https://issues.apache.org/jira/browse/HIVE-700 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Min Zhou Attachments: HIVE-700.1.patch Since we added Show Functions in HIVE-580, test results will depend on what temporary functions are added to the system. We should add the capability of DROP FUNCTION, and do that at the end of those create function tests to make sure the show functions results are deterministic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-700) Fix test error by adding DROP FUNCTION
[ https://issues.apache.org/jira/browse/HIVE-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736435#action_12736435 ] Min Zhou commented on HIVE-700: --- Sorry for my late. we have a training today, I will update a new patch for hive-700 related jiras. Fix test error by adding DROP FUNCTION Key: HIVE-700 URL: https://issues.apache.org/jira/browse/HIVE-700 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-700.1.patch, hive.700.2.patch Since we added Show Functions in HIVE-580, test results will depend on what temporary functions are added to the system. We should add the capability of DROP FUNCTION, and do that at the end of those create function tests to make sure the show functions results are deterministic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-702) DROP TEMPORARY FUNCTION should not drop builtin functions
[ https://issues.apache.org/jira/browse/HIVE-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-702: - Assignee: Min Zhou DROP TEMPORARY FUNCTION should not drop builtin functions - Key: HIVE-702 URL: https://issues.apache.org/jira/browse/HIVE-702 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Only temporary functions should be dropped. It should error out if the user tries to drop built-in functions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-642) udf equivalent to string split
[ https://issues.apache.org/jira/browse/HIVE-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733641#action_12733641 ] Min Zhou commented on HIVE-642: --- It's very useful for us . some comments: # Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. # getDisplayString() always return a string in lowercase. udf equivalent to string split -- Key: HIVE-642 URL: https://issues.apache.org/jira/browse/HIVE-642 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Emil Ibrishimov Fix For: 0.4.0 Attachments: HIVE-642.1.patch, HIVE-642.2.patch It would be very useful to have a function equivalent to string split in java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-599) Embedded Hive SQL into Python
[ https://issues.apache.org/jira/browse/HIVE-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733429#action_12733429 ] Min Zhou commented on HIVE-599: --- I agree with Namit and Yongqiang. I was thinking about creating function with a format like below: {noformat} create function function_name (arguments list ) as python { python udf code } create function function_name (arguments list ) as java{ java udf code } {noformat} we can dynamiclly compile those kinds of code above, use jython com.sun.tools.javac respectively. It's better store python or java udf byte code into the persistent metastore typically mysql after creation. We can call that function again w/o a second function creation. Embedded Hive SQL into Python - Key: HIVE-599 URL: https://issues.apache.org/jira/browse/HIVE-599 Project: Hadoop Hive Issue Type: New Feature Reporter: Ashish Thusoo Assignee: Ashish Thusoo While Hive does SQL it would be very powerful to be able to embed that SQL in languages like python in such a way that the hive query is also able to invoke python functions seemlessly. One possibility is to explore integration with Dumbo. Another is to see if the internal map_reduce.py tool can be open sourced as a Hive contrib. Other thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733015#action_12733015 ] Min Zhou commented on HIVE-512: --- can you answer me about this queries? [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.2.patch, HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733016#action_12733016 ] Min Zhou commented on HIVE-512: --- select(1, '2', 3) select(2, '2', 3) select(1, true, 3) select(2, 2.0, cast(3 as double)) if we don't uniformly return strings, it would be confused to user detemining which type will return. [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.2.patch, HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733103#action_12733103 ] Min Zhou commented on HIVE-512: --- If you inspected the implementation of case, you will know it's unacceptable to case with different types of arguments. see: GenericUDFCase.java , GenericUDFWhen.java {code} hive select case when true then '2' else 3 end from pokes limit 1; FAILED: Error in semantic analysis: line 1:36 Argument Type Mismatch 3: The expression after ELSE should have the same type as those after THEN: string is expected but int is found {code} elt is a string function, confusion will be caused if we casually change its behavior. It no need make things more complex. [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.2.patch, HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
It's strange two if clause with the same logics
Hi, see ObjectInspectorConverters.java:78 case STRING: if (outputOI instanceof WritableStringObjectInspector) { return new PrimitiveObjectInspectorConverter.TextConverter( (PrimitiveObjectInspector)inputOI); } else if (outputOI instanceof WritableStringObjectInspector) { return new PrimitiveObjectInspectorConverter.TextConverter( (PrimitiveObjectInspector)inputOI); } the second clause has the same logic with the first one. I guess there is something wrong, maybe one is for WritableStringObjectInspector, the other is for JavaStringObjectInspector. Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732911#action_12732911 ] Min Zhou commented on HIVE-512: --- Here is the definition of elt: Return string at index number. It's essentially a string function select elt(1, 2, 3) will return a varbinary in mysql, rather than a int. I still insist returning string is better. Even if do it as you said, what type of results will return when doing queries like below? select(1, '2', 3) select(2, '2', 3) select(1, true, 3) select(2, 2.0, cast(3 as double)) [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.2.patch, HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12732828#action_12732828 ] Min Zhou commented on HIVE-512: --- actually, elt return only two types of results in mysql : varbinary, varchar. varchar will be returned if all arguments are varchars, or varbinarys will be returned. mysql create table t3 as select elt(1, 'a', 3); Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql describe t3; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | elt(1, 'a', 3) | varbinary(1) | YES | | NULL| | +-+--+--+-+-+---+ 1 row in set (0.00 sec) mysql create table t4 as select elt(1, true, false); Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql describe t4; +--+--+--+-+-+---+ | Field| Type | Null | Key | Default | Extra | +--+--+--+-+-+---+ | elt(1, true, false) | varbinary(1) | YES | | NULL| | +--+--+--+-+-+---+ 1 row in set (0.00 sec) mysql create table t5 as select elt(1, 2.0, false); Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql describe t5; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | elt(1, 2.0, false) | varbinary(4) | YES | | NULL| | +-+--+--+-+-+---+ 1 row in set (0.00 sec) Based on the above, I think it better return string as binary is commonly used in hive. [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.2.patch, HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE
[ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731858#action_12731858 ] Min Zhou commented on HIVE-541: --- all test cases passed on my side, how's yours? Implement UDFs: INSTR and LOCATE Key: HIVE-541 URL: https://issues.apache.org/jira/browse/HIVE-541 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Attachments: HIVE-541.1.patch, HIVE-541.2.patch http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-515) [UDF] new string function INSTR(str,substr)
[ https://issues.apache.org/jira/browse/HIVE-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou resolved HIVE-515. --- Resolution: Duplicate duplicates [#HIVE-541] [UDF] new string function INSTR(str,substr) --- Key: HIVE-515 URL: https://issues.apache.org/jira/browse/HIVE-515 Project: Hadoop Hive Issue Type: New Feature Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-515-2.patch, HIVE-515.patch UDF for string function INSTR(str,substr) This extends the function from MySQL http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_instr usage: INSTR(str, substr) INSTR(str, substr, start) example: {code:sql} select instr('abcd', 'abc') from pokes; // all result are '1' select instr('abcabc', 'ccc') from pokes; // all result are '0' select instr('abcabc', 'abc', 2) from pokes; // all result are '4' {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-649) [UDF] now() for getting current time
[UDF] now() for getting current time Key: HIVE-649 URL: https://issues.apache.org/jira/browse/HIVE-649 Project: Hadoop Hive Issue Type: New Feature Reporter: Min Zhou http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE
[ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731764#action_12731764 ] Min Zhou commented on HIVE-541: --- hmm, It's may be a good way. I will try it soon. Implement UDFs: INSTR and LOCATE Key: HIVE-541 URL: https://issues.apache.org/jira/browse/HIVE-541 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Attachments: HIVE-541.1.patch http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE
[ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-541: -- Attachment: HIVE-541.2.patch Added a GenericUDFUtils.findText() where string encoding and decoding is avoided, faster execution will be gained. Implement UDFs: INSTR and LOCATE Key: HIVE-541 URL: https://issues.apache.org/jira/browse/HIVE-541 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Attachments: HIVE-541.1.patch, HIVE-541.2.patch http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-329) start and stop hive thrift server in daemon mode
[ https://issues.apache.org/jira/browse/HIVE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-329: - Assignee: Min Zhou start and stop hive thrift server in daemon mode - Key: HIVE-329 URL: https://issues.apache.org/jira/browse/HIVE-329 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Assignee: Min Zhou Attachments: daemon.patch I write two shell script to start and stop hive thrift server more convenient. usage: bin/hive --service start-hive [HIVE_PORT] bin/hive --service stop-hive -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.
[ https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-555: -- Attachment: HIVE-555-4.patch Add a copy of UDAF to avoid [HIVE-620|http://issues.apache.org/jira/browse/HIVE-620] for passing all test cases. create temporary function support not only udf, but also udaf, genericudf, etc. Key: HIVE-555 URL: https://issues.apache.org/jira/browse/HIVE-555 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch, HIVE-555-4.patch Right now, command 'create temporary function' only support udf. we can also let user write their udaf, generic udf, and write generic udaf in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Some wish after serious consideration
Hi all, Having focused on hive for several month, here is some wish of me after serious consideration 1. All auto-gen code for hive was under the facebook commercial version of thrift, which is older than the open source one, would lead to lots of compatible problems and stop from all helps from the open source community. We need to remove them as soon as possible, but it seems the progress on this issue has stopped. 2. Please give us a clear roadmap. We also have a plan improving hive, but our patches would probably be uncared-for, because it's not on the schedule of facebook. If go on like this, there should be a lot of compatible problems brought by other commits, we were surfing from fixing conflicts again and again. 3. Please don't commit code so rashly. Code from Ashish could easily be committed by others without a strict examination, that caused a lot of problems when using it here, bugs and incondite code hard to read and to extend it. Perhaps the main reason is that Ashish is the leader of Hive. Another person, Namit, always committed buggy or ugly code. I have a suggestion, just more discussion and tests with the helps of open source community. Code quality would be raised if so. (I don't intend to in the personal attacks here) Regards, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Created: (HIVE-618) More human-readable error prompt of FunctionTask
More human-readable error prompt of FunctionTask Key: HIVE-618 URL: https://issues.apache.org/jira/browse/HIVE-618 Project: Hadoop Hive Issue Type: Improvement Reporter: Min Zhou current prompt: {noformat} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask {noformat} Zheng suggested that somethings like below would be better {noformat} Class not found Class does not implement UDF, GenericUDF, or UDAF {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Some wish after serious consideration
I have been watching HIVE-438 for so long a time, you know that's a critical change almost impact the whole hive source tree, a quick resolve is need. It's understandable human resources of facebook are very nervous, developers there always join several projects at the same time. Therefore, we should use the power of the open source community to speed up the development of it. But right now, my feeling is that we only care about their own affairs, regardless of what other people do. This is not the pattern of the open source community, but we still immersed in this pattern. 2009/7/9 He Yongqiang heyongqi...@software.ict.ac.cn But I think no conflicts can be guaranteed, since conflicts are not raised by one patch. If no conflict appears to this patch, then there will be conflicts for other patches. My meaning was not refuse conflicts, there should be a way to avoid the frequency of them if we work together. I mean But I think no conflicts can NOT be guaranteed, since conflicts are not raised by one patch. If no conflict appears to this patch, then there will be conflicts for other patches. On 09-7-9 下午2:43, He Yongqiang heyongqi...@software.ict.ac.cn wrote: On 09-7-9 下午2:14, Min Zhou coderp...@gmail.com wrote: Hi all, Having focused on hive for several month, here is some wish of me after serious consideration 1. All auto-gen code for hive was under the facebook commercial version of thrift, which is older than the open source one, would lead to lots of compatible problems and stop from all helps from the open source community. We need to remove them as soon as possible, but it seems the progress on this issue has stopped. See Hive-438. I think it will be committed by this weekend? 2. Please give us a clear roadmap. We also have a plan improving hive, but our patches would probably be uncared-for, because it's not on the schedule of facebook. If go on like this, there should be a lot of compatible problems brought by other commits, we were surfing from fixing conflicts again and again. I think the hive roadmap on hive wiki page has just been updated. Please send out request for code review if you think the patch is ready. But I think no conflicts can be guaranteed, since conflicts are not raised by one patch. If no conflict appears to this patch, then there will be conflicts for other patches. 3. Please don't commit code so rashly. Code from Ashish could easily be committed by others without a strict examination, that caused a lot of problems when using it here, bugs and incondite code hard to read and to extend it. Perhaps the main reason is that Ashish is the leader of Hive. Another person, Namit, always committed buggy or ugly code. I have a suggestion, just more discussion and tests with the helps of open source community. Code quality would be raised if so. (I don't intend to in the personal attacks here) Code review is a kind of really hard and boring work. And we can only say that the code is much likely with no error. A patch is committed with at least two persons' work, the patch submitter and the code reviewer. Sometimes the code is really hard to find errors either by eyes or tests, so please be more patient. The bugs can be fixed soon after observing. And I agree with you suggestion on more discussion, so please comment on the jira pages for issues you think need more discussion and tests. BTW, I think as the hive community grows, there could be more discussions. So the first priority issue should be how to enlarge the hive community, and let more people involved in the discussion of the hive mail-list or jira. Regards, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
[jira] Work started: (HIVE-512) [GenericUDF] new string function ELT(N,str1,str2,str3,...)
[ https://issues.apache.org/jira/browse/HIVE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-512 started by Min Zhou. [GenericUDF] new string function ELT(N,str1,str2,str3,...) --- Key: HIVE-512 URL: https://issues.apache.org/jira/browse/HIVE-512 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-512.patch ELT(N,str1,str2,str3,...) Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). {noformat} mysql SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); - 'ej' mysql SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); - 'foo' {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.
[ https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-555: -- Attachment: HIVE-555-2.patch with unit tests. create temporary function support not only udf, but also udaf, genericudf, etc. Key: HIVE-555 URL: https://issues.apache.org/jira/browse/HIVE-555 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-555-1.patch, HIVE-555-2.patch Right now, command 'create temporary function' only support udf. we can also let user write their udaf, generic udf, and write generic udaf in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.
[ https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728989#action_12728989 ] Min Zhou commented on HIVE-555: --- 1. I thought it would be a common function for generic udf error prompt. 2. It that required for an existing generic udf? but regardless whatever, i'll do it. create temporary function support not only udf, but also udaf, genericudf, etc. Key: HIVE-555 URL: https://issues.apache.org/jira/browse/HIVE-555 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-555-1.patch, HIVE-555-2.patch Right now, command 'create temporary function' only support udf. we can also let user write their udaf, generic udf, and write generic udaf in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.
[ https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-555: -- Attachment: HIVE-555-3.patch patch followed namit's comments. create temporary function support not only udf, but also udaf, genericudf, etc. Key: HIVE-555 URL: https://issues.apache.org/jira/browse/HIVE-555 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch Right now, command 'create temporary function' only support udf. we can also let user write their udaf, generic udf, and write generic udaf in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-555) create temporary function support not only udf, but also udaf, genericudf, etc.
[ https://issues.apache.org/jira/browse/HIVE-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729009#action_12729009 ] Min Zhou commented on HIVE-555: --- @Zheng It would involved some logic out of the FuctionTask. Actually , execute methods of all Task classes is defined to return an integer stand for status code. So create another jira for that issue is better. Agree? create temporary function support not only udf, but also udaf, genericudf, etc. Key: HIVE-555 URL: https://issues.apache.org/jira/browse/HIVE-555 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-555-1.patch, HIVE-555-2.patch, HIVE-555-3.patch Right now, command 'create temporary function' only support udf. we can also let user write their udaf, generic udf, and write generic udaf in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-329) start and stop hive thrift server in daemon mode
[ https://issues.apache.org/jira/browse/HIVE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729012#action_12729012 ] Min Zhou commented on HIVE-329: --- start need a port number, but stop needn't. start and stop hive thrift server in daemon mode - Key: HIVE-329 URL: https://issues.apache.org/jira/browse/HIVE-329 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Attachments: daemon.patch I write two shell script to start and stop hive thrift server more convenient. usage: bin/hive --service start-hive [HIVE_PORT] bin/hive --service stop-hive -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727999#action_12727999 ] Min Zhou commented on HIVE-537: --- Zheng, how would you get field value from an object without a ordinal? Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Min Zhou Attachments: HIVE-537.1.patch There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; An example serialization format (Using deliminated format, with ' ' as first-level delimitor and '=' as second-level delimitor) userid:int,log:union0:structtouserid:int,message:string,1:string 123 1=login 123 0=243=helloworld 123 1=logout {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-537: -- Attachment: HIVE-537.1.patch HIVE-537.1.patch Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-537.1.patch There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; An example serialization format (Using deliminated format, with ' ' as first-level delimitor and '=' as second-level delimitor) userid:int,log:union0:structtouserid:int,message:string,1:string 123 1=login 123 0=243=helloworld 123 1=logout {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725532#action_12725532 ] Min Zhou commented on HIVE-537: --- Even if UnionObjectInspector has been implemented, the DynamicSerDe seems don't support the schema with a union type which thrift can't recoginze. We must find a way solving it, any suggestions? Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Zheng Shao There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; An example serialization format (Using deliminated format, with ' ' as first-level delimitor and '=' as second-level delimitor) userid:int,log:union0:structtouserid:int,message:string,1:string 123 1=login 123 0=243=helloworld 123 1=logout {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725564#action_12725564 ] Min Zhou commented on HIVE-577: --- Passed all testcase in hadoop 0.17.0 -0.19.1. return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch, HIVE-577.2.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-577: -- Attachment: HIVE-577.1.patch can retrieve all columns' comments now. return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-577: -- Attachment: HIVE-577.2.patch @Prasad I considered that case you mentioned before uploaded a that patch, just didn't know what is the meaning of code. this patch would cope the issue. return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch, HIVE-577.2.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725450#action_12725450 ] Min Zhou commented on HIVE-577: --- I guessed it's cumbersome to deal with custom tables from current api provided by hive currently. ddl for schema should changed from struct{ type1 col1, type2 col2} to some format like struct{ struct{type1 col1, string comment1}, struct{type2 col2, string comment2}} however, MetaStoreUtils.getDDLFromFieldSchema(structName, fieldSchemas) is not only for getSchema(table). return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch, HIVE-577.2.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725450#action_12725450 ] Min Zhou edited comment on HIVE-577 at 6/29/09 8:15 PM: I guessed it's cumbersome to deal with custom tables from the api provided by hive currently. DDL for table schema should changed from struct{ type1 col1, type2 col2} to some format like struct{ struct{type1 col1, string comment1}, struct{type2 col2, string comment2}} however, MetaStoreUtils.getDDLFromFieldSchema(structName, fieldSchemas) is not only for getSchema(table). was (Author: coderplay): I guessed it's cumbersome to deal with custom tables from current api provided by hive currently. ddl for schema should changed from struct{ type1 col1, type2 col2} to some format like struct{ struct{type1 col1, string comment1}, struct{type2 col2, string comment2}} however, MetaStoreUtils.getDDLFromFieldSchema(structName, fieldSchemas) is not only for getSchema(table). return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch, HIVE-577.2.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725473#action_12725473 ] Min Zhou commented on HIVE-577: --- Any suggestions on this or accepting the 2nd patch, Prasad? return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou Attachments: HIVE-577.1.patch, HIVE-577.2.patch comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724916#action_12724916 ] Min Zhou commented on HIVE-537: --- we've done a test about this issue, dataset: 700m records. first approach, each distinct count needs 119 seconds, that's means 10 distinct count needs at least 1190 seconds. second approach where distinct keys were distinguished by a tag, 10 distinct count need 148 seconds. Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Zheng Shao There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-576) complete jdbc driver
[ https://issues.apache.org/jira/browse/HIVE-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724060#action_12724060 ] Min Zhou commented on HIVE-576: --- Dones To Dos : # removed all useless comments auto-gened by eclipse. # added APL statements for each file # fixed a bug SemanticAnalyzer.getSchema() fails after doing select-all queries on tables have partitions, where queries like select * from tbl where partition_name=value # implemented HiveResultSetMetadata, HiveDatabaseMetadata # HiveResultSet supported getXXX(columnName) now # removed JdbcSessionState hasnot been used # supported SQL Explorer for manipulate hive data by GUI # todo: implement HivePreparedStatement HiveCallableStatement complete jdbc driver Key: HIVE-576 URL: https://issues.apache.org/jira/browse/HIVE-576 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Min Zhou Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-576.1.patch, HIVE-576.2.patch, sqlexplorer.jpg hive only support a few interfaces of jdbc, let's complete it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: (was: tables.jpg) jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: (was: sqlexplorer.jpg) jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-576) complete jdbc driver
complete jdbc driver Key: HIVE-576 URL: https://issues.apache.org/jira/browse/HIVE-576 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Min Zhou Fix For: 0.4.0 hive only support a few interfaces of jdbc, let's complete it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723526#action_12723526 ] Min Zhou commented on HIVE-567: --- It's not elegant getting schema from hiveserver by the means of adding a function getFullDDLFromFieldSchema. jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-573) TestHiveServer broken
[ https://issues.apache.org/jira/browse/HIVE-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723841#action_12723841 ] Min Zhou commented on HIVE-573: --- it's a good way use json through Avro here, but making things more complex. serde(although is not a rpc), thrift, avro, 3 duplications of works. TestHiveServer broken - Key: HIVE-573 URL: https://issues.apache.org/jira/browse/HIVE-573 Project: Hadoop Hive Issue Type: Bug Components: Server Infrastructure Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-573.1.patch This was after the change to HIVE-567 was committed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (HIVE-577) return correct comment of a column from ThriftHiveMetastore.Iface.get_fields
[ https://issues.apache.org/jira/browse/HIVE-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-577 started by Min Zhou. return correct comment of a column from ThriftHiveMetastore.Iface.get_fields Key: HIVE-577 URL: https://issues.apache.org/jira/browse/HIVE-577 Project: Hadoop Hive Issue Type: Sub-task Reporter: Min Zhou Assignee: Min Zhou comment of each column hasnot been retrieved correct right now , FieldSchema.getComment() will return a string from derserializer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-574) Hive should use ClassLoader from hadoop Configuration
[ https://issues.apache.org/jira/browse/HIVE-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723874#action_12723874 ] Min Zhou commented on HIVE-574: --- +1 for Zheng, thanks It worked fine here, nothing abnormal. Hive should use ClassLoader from hadoop Configuration - Key: HIVE-574 URL: https://issues.apache.org/jira/browse/HIVE-574 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0, 0.3.1 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-574.1.patch, HIVE-574.2.patch, HIVE-574.3.patch See HIVE-338. Hive should always use the getClassByName method from hadoop Configuration, so that we choose the correct ClassLoader. Examples include all plug-in interfaces, including UDF/GenericUDF/UDAF, SerDe, and FileFormats. Basically the following code snippet shows the idea: {code} package org.apache.hadoop.conf; public class Configuration implements IterableMap.EntryString,String { ... /** * Load a class by name. * * @param name the class name. * @return the class object. * @throws ClassNotFoundException if the class is not found. */ public Class? getClassByName(String name) throws ClassNotFoundException { return Class.forName(name, true, classLoader); } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata
[ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-559: -- Issue Type: Sub-task (was: New Feature) Parent: HIVE-576 Support JDBC ResultSetMetadata -- Key: HIVE-559 URL: https://issues.apache.org/jira/browse/HIVE-559 Project: Hadoop Hive Issue Type: Sub-task Components: Clients Reporter: Bill Graham Assignee: Min Zhou Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect: http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server. 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges. Any thoughts or suggestions? Option #1 feels clunkier, yet safer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: sqlexplorer.jpg jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, sqlexplorer.jpg, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-338) Executing cli commands into thrift server
[ https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723400#action_12723400 ] Min Zhou commented on HIVE-338: --- Can you exlain why you made a change at FunctionTask .java? It caused a java.lang.ClassNotFoundException when I executing my udf. ClassLoader didnot work. Executing cli commands into thrift server - Key: HIVE-338 URL: https://issues.apache.org/jira/browse/HIVE-338 Project: Hadoop Hive Issue Type: Improvement Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Assignee: Zheng Shao Fix For: 0.4.0 Attachments: hive-338.final.patch, HIVE-338.postfix.1.patch, hiveserver-v1.patch, hiveserver-v2.patch, hiveserver-v3.patch Let thrift server support set, add/delete file/jar and normal HSQL query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-338) Executing cli commands into thrift server
[ https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723400#action_12723400 ] Min Zhou edited comment on HIVE-338 at 6/23/09 7:28 PM: Can you exlain why you made a change at FunctionTask .java? It caused a java.lang.ClassNotFoundException when I executing my udf where mr jobs were submitted by hive cli. ClassLoader didnot work. was (Author: coderplay): Can you exlain why you made a change at FunctionTask .java? It caused a java.lang.ClassNotFoundException when I executing my udf. ClassLoader didnot work. Executing cli commands into thrift server - Key: HIVE-338 URL: https://issues.apache.org/jira/browse/HIVE-338 Project: Hadoop Hive Issue Type: Improvement Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Assignee: Zheng Shao Fix For: 0.4.0 Attachments: hive-338.final.patch, HIVE-338.postfix.1.patch, hiveserver-v1.patch, hiveserver-v2.patch, hiveserver-v3.patch Let thrift server support set, add/delete file/jar and normal HSQL query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: tables.jpg jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-567: - Assignee: Min Zhou (was: Raghotham Murthy) jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Min Zhou Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: (was: result.jpg) jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Min Zhou Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Attachment: result.jpg jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Min Zhou Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-567: - Assignee: Raghotham Murthy (was: Min Zhou) incorrect manipulation jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-567) jdbc: integrate hive with pentaho report designer
[ https://issues.apache.org/jira/browse/HIVE-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-567: -- Comment: was deleted (was: incorrect manipulation) jdbc: integrate hive with pentaho report designer - Key: HIVE-567 URL: https://issues.apache.org/jira/browse/HIVE-567 Project: Hadoop Hive Issue Type: Improvement Components: Clients Reporter: Raghotham Murthy Assignee: Raghotham Murthy Fix For: 0.4.0 Attachments: hive-567-server-output.txt, hive-567.1.patch, hive-567.2.patch, hive-567.3.patch, hive-pentaho.tgz, result.jpg, tables.jpg Instead of trying to get a complete implementation of jdbc, its probably more useful to pick reporting/analytics software out there and implement the jdbc methods necessary to get them working. This jira is a first attempt at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721169#action_12721169 ] Min Zhou commented on HIVE-521: --- I didn't think all tests would pass due to the shortage of a class BinaryComparable. The reason why failing has nothing to do with this jira. you can check out the trunk,and do ant -Dhadoop.version=0.17.0 test -Doverwrite=true then error message will be displayed. ... [junit] Exception: org/apache/hadoop/io/BinaryComparable [junit] java.lang.NoClassDefFoundError: org/apache/hadoop/io/BinaryComparable [junit] at java.lang.Class.getDeclaredConstructors0(Native Method) [junit] at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389) [junit] at java.lang.Class.getConstructor0(Class.java:2699) [junit] at java.lang.Class.newInstance0(Class.java:326) [junit] at java.lang.Class.newInstance(Class.java:308) [junit] at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getUDFMethod(FunctionRegistry.java:309) [junit] at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDesc(TypeCheckProcFactory.java:451) [junit] at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:558) [junit] at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:653) [junit] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:80) [junit] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:83) [junit] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:116) [junit] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:95) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:3922) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1000) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:986) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3163) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:3610) [junit] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:3840) [junit] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76) [junit] at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:44) [junit] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76) [junit] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:177) [junit] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:209) [junit] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176) [junit] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216) [junit] at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:471) [junit] at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity(TestCliDriver.java:726) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:154) [junit] at junit.framework.TestCase.runBare(TestCase.java:127) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567) [junit] Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.BinaryComparable [junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [junit
[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721231#action_12721231 ] Min Zhou commented on HIVE-521: --- @ HIVE-521-all-v7.patch # {code:java} boolean conditionTypeIsOk = (arguments[0].getCategory() == ObjectInspector.Category.PRIMITIVE); if (conditionTypeIsOk) { PrimitiveObjectInspector poi = ((PrimitiveObjectInspector)arguments[0]); conditionTypeIsOk = (poi.getPrimitiveCategory() == PrimitiveObjectInspector.PrimitiveCategory.BOOLEAN || poi.getPrimitiveCategory() == PrimitiveObjectInspector.PrimitiveCategory.VOID); } if (!conditionTypeIsOk) { throw new UDFArgumentTypeException(0, The first argument of function IF should be \ + Constants.BOOLEAN_TYPE_NAME + \, but \ + arguments[0].getTypeName() + \ is found); } {code} # {code:java} String typeName = arguments[0].getTypeName(); if (!typeName.equals(Constants.BOOLEAN_TYPE_NAME) || !typeName.equals(Constants.VOID_TYPE_NAME)) { throw new UDFArgumentTypeException(0, The first expression of function IF is expected to \ + Constants.BOOLEAN_TYPE_NAME + \, but \ + arguments[0].getTypeName() + \ is found); } {code} I though the 2nd approach is more concise, do you think so? Move size, if, isnull, isnotnull to GenericUDF -- Key: HIVE-521 URL: https://issues.apache.org/jira/browse/HIVE-521 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, HIVE-521-all-v6.patch, HIVE-521-all-v7.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch See HIVE-511 for an example of the move. size, if, isnull, isnotnull are all implemented with UDF but they are actually working on variable types of objects. We should move them to GenericUDF for better type handling. This also helps to clean up the hack in doing type matching/type conversion in UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12721595#action_12721595 ] Min Zhou commented on HIVE-521: --- ok, we are hairsplitting. passed all tests here, let commit it . +1 Move size, if, isnull, isnotnull to GenericUDF -- Key: HIVE-521 URL: https://issues.apache.org/jira/browse/HIVE-521 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, HIVE-521-all-v6.patch, HIVE-521-all-v7.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch See HIVE-511 for an example of the move. size, if, isnull, isnotnull are all implemented with UDF but they are actually working on variable types of objects. We should move them to GenericUDF for better type handling. This also helps to clean up the hack in doing type matching/type conversion in UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-521: -- Attachment: HIVE-521-all-v5.patch Move size, if, isnull, isnotnull to GenericUDF -- Key: HIVE-521 URL: https://issues.apache.org/jira/browse/HIVE-521 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch See HIVE-511 for an example of the move. size, if, isnull, isnotnull are all implemented with UDF but they are actually working on variable types of objects. We should move them to GenericUDF for better type handling. This also helps to clean up the hack in doing type matching/type conversion in UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-521: -- Attachment: HIVE-521-all-v6.patch Move size, if, isnull, isnotnull to GenericUDF -- Key: HIVE-521 URL: https://issues.apache.org/jira/browse/HIVE-521 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-all-v5.patch, HIVE-521-all-v6.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch See HIVE-511 for an example of the move. size, if, isnull, isnotnull are all implemented with UDF but they are actually working on variable types of objects. We should move them to GenericUDF for better type handling. This also helps to clean up the hack in doing type matching/type conversion in UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-564) sweep the non-open source elements from hive
sweep the non-open source elements from hive Key: HIVE-564 URL: https://issues.apache.org/jira/browse/HIVE-564 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Min Zhou Fix For: 0.4.0 There are some non-open source things from facebook in current version of Hive, we should replace them with an open-source version of fb303.jar, libthrift.jar, etc, this open-source community are more likely to amend the relevant code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-521) Move size, if, isnull, isnotnull to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated HIVE-521: -- Attachment: HIVE-521-all-v4.patch passed tests on hadoop version 0.17.0. Move size, if, isnull, isnotnull to GenericUDF -- Key: HIVE-521 URL: https://issues.apache.org/jira/browse/HIVE-521 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Min Zhou Fix For: 0.4.0 Attachments: HIVE-521-all-v1.patch, HIVE-521-all-v2.patch, HIVE-521-all-v3.patch, HIVE-521-all-v4.patch, HIVE-521-IF-2.patch, HIVE-521-IF-3.patch, HIVE-521-IF-4.patch, HIVE-521-IF-5.patch, HIVE-521-IF.patch See HIVE-511 for an example of the move. size, if, isnull, isnotnull are all implemented with UDF but they are actually working on variable types of objects. We should move them to GenericUDF for better type handling. This also helps to clean up the hack in doing type matching/type conversion in UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-338) Executing cli commands into thrift server
[ https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720462#action_12720462 ] Min Zhou commented on HIVE-338: --- I think you should take a look at these lines of org.apache.hadoop.conf.Configuration {code:java} private ClassLoader classLoader; { classLoader = Thread.currentThread().getContextClassLoader(); if (classLoader == null) { classLoader = Configuration.class.getClassLoader(); } } ... public Class? getClassByName(String name) throws ClassNotFoundException { return Class.forName(name, true, classLoader); } {code} ClassLoader of current thread changed when adding jars into ClassPath, conf hasnot synchronously get that change. Executing cli commands into thrift server - Key: HIVE-338 URL: https://issues.apache.org/jira/browse/HIVE-338 Project: Hadoop Hive Issue Type: Improvement Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Assignee: Min Zhou Attachments: hiveserver-v1.patch, hiveserver-v2.patch, hiveserver-v3.patch Let thrift server support set, add/delete file/jar and normal HSQL query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-556) let hive support theta join
[ https://issues.apache.org/jira/browse/HIVE-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719518#action_12719518 ] Min Zhou commented on HIVE-556: --- I didn't see any filter there, hive will put all fields of my small table into HTree. {noformat} hiveexplain select /*+ MAPJOIN(a) */ a.url_pattern, w.url from application a join web_log w where w.logdate='20090611' and w.url rlike a.url_pattern and a.dt='20090609'; Common Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {bussiness_id} {subclass_id} {class_id} {note} {name} {url_pattern} {dt} 1 {noformat} We only put a.url_pattern into a HashMap in our raw map-reduce implemenation. let hive support theta join --- Key: HIVE-556 URL: https://issues.apache.org/jira/browse/HIVE-556 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Min Zhou Fix For: 0.4.0 Right now , hive only support equal joins . Sometimes it's not enough, we must consider implementing theta joins like {code:sql} SELECT a.subid, a.id, t.url FROM tbl t JOIN aux_tbl a ON t.url rlike a.url_pattern WHERE t.dt='20090609' AND a.dt='20090609'; {code} any condition expression following 'ON' is appropriate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-559) Support JDBC ResultSetMetadata
[ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou reassigned HIVE-559: - Assignee: Min Zhou Support JDBC ResultSetMetadata -- Key: HIVE-559 URL: https://issues.apache.org/jira/browse/HIVE-559 Project: Hadoop Hive Issue Type: New Feature Components: Clients Reporter: Bill Graham Assignee: Min Zhou Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect: http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server. 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges. Any thoughts or suggestions? Option #1 feels clunkier, yet safer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-474) Support for distinct selection on two or more columns
[ https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719368#action_12719368 ] Min Zhou edited comment on HIVE-474 at 6/14/09 7:02 PM: I thought there is another special case here. If the query has multiple distinct operations on the same column , we can push down the evaluation of those expressions into reducers. {code} Query: select a, count(distinct if(codition, b, null)) as col1, count(distinct if(!condition, null, b)) as col2, count(distinct b) as col3 Plan: Job : Map side: Emit: distribution_key: a, sort_key: a, b, value: nothing Reduce side: Group By a, count col1, col2, col3 by evaluating their expressions {code} was (Author: coderplay): I thought there is another special case here. If the query has multiple distinct operations on the same column , we can push down the evaluation of those expressions into reducers. Query: select a, count(distinct if(codition, b, null)) as col1, count(distinct if(!condition, null, b)) as col2, count(distinct b) as col3 Plan: Job : Map side: Emit: distribution_key: a, sort_key: a, b, value: nothing Reduce side: Group By a, count col1, col2, col3 by evaluating their expressions Support for distinct selection on two or more columns - Key: HIVE-474 URL: https://issues.apache.org/jira/browse/HIVE-474 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Alexis Rondeau The ability to select distinct several, individual columns as by example: select count(distinct user), count(distinct session) from actions; Currently returns the following failure: FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns
[ https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719368#action_12719368 ] Min Zhou commented on HIVE-474: --- I thought there is another special case here. If the query has multiple distinct operations on the same column , we can push down the evaluation of those expressions into reducers. Query: select a, count(distinct if(codition, b, null)) as col1, count(distinct if(!condition, null, b)) as col2, count(distinct b) as col3 Plan: Job : Map side: Emit: distribution_key: a, sort_key: a, b, value: nothing Reduce side: Group By a, count col1, col2, col3 by evaluating their expressions Support for distinct selection on two or more columns - Key: HIVE-474 URL: https://issues.apache.org/jira/browse/HIVE-474 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Alexis Rondeau The ability to select distinct several, individual columns as by example: select count(distinct user), count(distinct session) from actions; Currently returns the following failure: FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-338) Executing cli commands into thrift server
[ https://issues.apache.org/jira/browse/HIVE-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717651#action_12717651 ] Min Zhou edited comment on HIVE-338 at 6/10/09 11:37 PM: - * exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution. Yes, of course. the class loader holding by HiveConf is older than that of current thread. this pacth support dfs, add/delete file/jar, set now. btw, Joydeep, would you do me a favor writing some test code that I am not familiar with? you know, ' add jar' need a separate jar, and i not quite sure how to organize them. was (Author: coderplay): * exec/FunctionTask.java: is it necessary to specify the loader in the Class.forName call? I thought that that the current thread context loader was the always the first loader to be tried anyway during name resolution. Yes, of course. the class loader holding by HiveConf is older than that of current thread. this pacth support dfs, add/delete file/jar, set now. btw, Joydeep, would you do me a favor writing some test code that I' am not familiar with it ? you know, ' add jar' need a separate jar, and i not quite sure how to organize them. Executing cli commands into thrift server - Key: HIVE-338 URL: https://issues.apache.org/jira/browse/HIVE-338 Project: Hadoop Hive Issue Type: Improvement Components: Server Infrastructure Affects Versions: 0.3.0 Reporter: Min Zhou Attachments: hiveserver-v1.patch, hiveserver-v2.patch Let thrift server support set, add/delete file/jar and normal HSQL query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.