[jira] [Resolved] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar resolved HIVE-9557. -- Resolution: Invalid > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Labels: CosineSimilarity, SimilarityMetric, UDF > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.1.patch) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Labels: CosineSimilarity, SimilarityMetric, UDF > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.3.patch) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Labels: CosineSimilarity, SimilarityMetric, UDF > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.2.patch) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > Labels: CosineSimilarity, SimilarityMetric, UDF > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Attachment: (was: HIVE-11137.1.patch) > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Assignee: Owen O'Malley (was: Nishant Kelkar) > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11137.1.patch > > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: udf_cosine_similarity-v01.patch) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610816#comment-14610816 ] Nishant Kelkar commented on HIVE-11137: --- >From the hive.log, I see the following two issues: {code} 2015-07-01 11:13:28,877 ERROR [Thread-17]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(101)) - Error starting HiveServer2: could not start ThriftBinaryCLIService org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:1. at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:109) at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:91) at org.apache.thrift.transport.TServerSocket.(TServerSocket.java:87) at org.apache.hive.service.auth.HiveAuthFactory.getServerSocket(HiveAuthFactory.java:241) at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:66) at java.lang.Thread.run(Thread.java:744) {code} and {code} 2015-07-01 11:13:18,009 DEBUG [main]: util.Shell (Shell.java:checkHadoopHome(320)) - Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.(Shell.java:327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2375) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:366) at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105) at org.apache.hive.service.auth.TestCustomAuthentication.setUp(TestCustomAuthentication.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Nishant Kelkar > Attachments: HIVE-11137.1.patch > > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610813#comment-14610813 ] Nishant Kelkar commented on HIVE-11137: --- Is this an unrelated test failure? > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Nishant Kelkar > Attachments: HIVE-11137.1.patch > > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11105) NegativeArraySizeException from org.apache.hadoop.io.BytesWritable.setCapacity during serialization phase
[ https://issues.apache.org/jira/browse/HIVE-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609905#comment-14609905 ] Nishant Kelkar commented on HIVE-11105: --- What version of Hive are you using? > NegativeArraySizeException from > org.apache.hadoop.io.BytesWritable.setCapacity during serialization phase > - > > Key: HIVE-11105 > URL: https://issues.apache.org/jira/browse/HIVE-11105 > Project: Hive > Issue Type: Bug >Reporter: Priyesh Raj > > I am getting the exception while running a query on very large data set. The > issue is coming in Hive, however my understanding is it's a hadoop > setCapacity function problem. The variable definition is integer and it is > not able to handle such a large count. > Please look into it. > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138) > ... 13 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:336) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082) > ... 14 more > Caused by: java.lang.NegativeArraySizeException > at > org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144) > at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123) > at org.apache.hadoop.io.BytesWritable.set(BytesWritable.java:171) > at > org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:213) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:456) > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:316) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609881#comment-14609881 ] Nishant Kelkar commented on HIVE-11137: --- BTW, let me know if submitting patch != taking ownership of task in general. That way, I can hand it back to you (still learning all the rules here). Thank you! > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Nishant Kelkar > Attachments: HIVE-11137.1.patch > > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Attachment: HIVE-11137.1.patch Submitting revision #1 patch. > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11137.1.patch > > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.3.patch Attaching revision #3 patch to remove hidden dependency on FastMath (it comes in via org.apache.spark:spark-core_2.10 dependency) from commons-math3. Using library Math instead. > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch, > udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605940#comment-14605940 ] Nishant Kelkar commented on HIVE-11137: --- LazyBinaryUtils used only for readVInt() and writeVInt(). Relevant sections of code from LazyBinaryUtils: {code} private static ThreadLocal vLongBytesThreadLocal = new ThreadLocal() { @Override public byte[] initialValue() { return new byte[9]; } }; public static void writeVLong(RandomAccessOutput byteStream, long l) { byte[] vLongBytes = vLongBytesThreadLocal.get(); int len = LazyBinaryUtils.writeVLongToByteArray(vLongBytes, l); byteStream.write(vLongBytes, 0, len); } {code} {code} /** * Reads a zero-compressed encoded int from a byte array and returns it. * * @param bytes * the byte array * @param offset * offset of the array to read from * @param vInt * storing the deserialized int and its size in byte */ public static void readVInt(byte[] bytes, int offset, VInt vInt) { byte firstByte = bytes[offset]; vInt.length = (byte) WritableUtils.decodeVIntSize(firstByte); if (vInt.length == 1) { vInt.value = firstByte; return; } int i = 0; for (int idx = 0; idx < vInt.length - 1; idx++) { byte b = bytes[offset + 1 + idx]; i = i << 8; i = i | (b & 0xFF); } vInt.value = (WritableUtils.isNegativeVInt(firstByte) ? (i ^ -1) : i); } {code} I could contribute a patch towards this task [~owen.omalley] (I'm a beginner contributor in Hive, looking around for work :)). Thanks and let me know! > In DateWritable remove the use of LazyBinaryUtils > - > > Key: HIVE-11137 > URL: https://issues.apache.org/jira/browse/HIVE-11137 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > Currently the DateWritable class uses LazyBinaryUtils, which has a lot of > dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.2.patch > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, > udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604349#comment-14604349 ] Nishant Kelkar commented on HIVE-9557: -- Done. Could you please test for access now? > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604335#comment-14604335 ] Nishant Kelkar commented on HIVE-9557: -- Hey Alexander, Hmmm, in the review settings, I've added the group 'hive' and the user 'apivovarov'. I used rbt to create and upload the ticket to the Apache server. > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604318#comment-14604318 ] Nishant Kelkar commented on HIVE-9557: -- I'm not handling the "both empty strings" case. Will upload an updated patch. > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: HIVE-9557.1.patch Attached first revision on cosine similarity UDF. > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: HIVE-9557.1.patch, udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604068#comment-14604068 ] Nishant Kelkar commented on HIVE-9557: -- Figured out the issue. Made a dummy var. HADOOP_HOME point to HIVE_HOME. Also, removed commented out queries from the udf_cosine_similarity.q clientpositive file. I'll upload a patch with an RB link soon. > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603988#comment-14603988 ] Nishant Kelkar commented on HIVE-9557: -- The TestCliDriver tests actually fail with the following error: {code} --- T E S T S --- Running org.apache.hadoop.hive.cli.TestCliDriver Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.797 sec <<< FAILURE! - in org.apache.hadoop.hive.cli.TestCliDriver testCliDriver_udf_cosine_similarity(org.apache.hadoop.hive.cli.TestCliDriver) Time elapsed: 0.346 sec <<< FAILURE! junit.framework.AssertionFailedError: Unexpected exception junit.framework.AssertionFailedError: Client Execution failed with error code = 10014 running select cosine_similarity('kitten', 'sitting', ' '), cosine_similarity('sitting kitten', 'kitten sitting', ' '), cosine_similarity('sitting kitten', 'sitting kittens', ' '), cosine_similarity('two#delimiters,here', 'two#delimiters#,here,too', '#,'), cosine_similarity('test string', '', ' '), cosine_similarity(cast(null as string), 'test string', ' '), cosine_similarity('test string', cast(null as string), ','), cosine_similarity(cast(null as string), cast(null as string), ' '), cosine_similarity('a string', 'another string', '') See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs. at junit.framework.Assert.fail(Assert.java:57) at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:1984) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:152) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_cosine_similarity(TestCliDriver.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603987#comment-14603987 ] Nishant Kelkar commented on HIVE-9557: -- Hi [~apivovarov], I followed your instructions, and everything went fine till the step where I run the TestCliDriver with 'mvn test'. I get the following exception in ./itests/qtest/tmp/log/hive.log: {code} 2015-06-26 22:25:47,656 DEBUG [main]: util.Shell (Shell.java:checkHadoopHome(320)) - Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.(Shell.java:327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2371) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:366) at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105) at org.apache.hadoop.hive.ql.QTestUtil.(QTestUtil.java:354) at org.apache.hadoop.hive.cli.TestCliDriver.(TestCliDriver.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35) at org.junit.internal.runners.SuiteMethod.(SuiteMethod.java:24) at org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:11) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59) at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26) at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:26) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) 2015-06-26 22:25:47,669 DEBUG [main]: util.Shell (Shell.java:isSetsidSupported(392)) - setsid is not available on this machine. So not using it. 2015-06-26 22:25:47,669 DEBUG [main]: util.Shell (Shell.java:isSetsidSupported(396)) - setsid exited with exit code 0 2015-06-26 22:25:48,408 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.metastoresite does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.ql.log.PerfLogger.level does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.hivesite does not exist 2015-06-26 22:25:48,409 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.dummyparam.test.server.specific.config.override does not exist 2015-06-26 22:25:48,410 WARN [main]: conf.HiveConf (HiveConf.java:initialize(2802)) - HiveConf of name hive.metastore.metadb.dir does not exist 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:host.name=localhost 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:host.name=localhost 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.version=1.7.0_67 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.version=1.7.0_67 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.vendor=Oracle Corporation 2015-06-26 22:25:48,477 INFO [main]: server.ZooKeeperServer (Environment.java:logEnv(100)) - Server environment:java.vendor=
[jira] [Commented] (HIVE-11114) Documentation of Pentaho Missing from Maven Central
[ https://issues.apache.org/jira/browse/HIVE-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601720#comment-14601720 ] Nishant Kelkar commented on HIVE-4: --- [~leftylev] tagging you here for more help/info. > Documentation of Pentaho Missing from Maven Central > --- > > Key: HIVE-4 > URL: https://issues.apache.org/jira/browse/HIVE-4 > Project: Hive > Issue Type: Task >Reporter: Nishant Kelkar >Assignee: Nishant Kelkar >Priority: Minor > > I recently cloned the Hive Git repository. When I went into the hive/ql > sub-project and issued the command 'mvn clean compile -Phadoop-1', I got the > following build error: > [ERROR] Failed to execute goal on project hive-exec: Could not resolve > dependencies for project org.apache.hive:hive-exec:jar:2.0.0-SNAPSHOT: Could > not find artifact org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde > in US (http://repo.maven.apache.org/maven2) -> [Help 1] > This is because the pentaho-aggdesigner-algorithm dependency is not supported > by Maven central; however, it is supported by Conjars. > As a quick fix, I downloaded the jar from Conjars repo, and manually > installed this dependency to my local Maven by following the instructions > here: > http://www.mkyong.com/maven/how-to-include-library-manully-into-maven-local-repository/ > However, I feel this dependency should be supported on Maven central (I'm not > sure where to create this ticket/whom with, but Hive is my use case, so any > pointers greatly appreciated). > This ticket tracks the task of documenting this fact on the Hive wiki as an > additional Note. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600145#comment-14600145 ] Nishant Kelkar commented on HIVE-9557: -- [~apivovarov], I had a question: When I prepare a clientpositives/udf_cosine_similarity.q and a clientnegative/udf_cosine_similarity.q, how do I run these? Also, how do I create the q.out file? > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599747#comment-14599747 ] Nishant Kelkar commented on HIVE-9557: -- Thanks for the pointers! I'll modify the patch per your instructions and reupload. Thanks for working with me through my first patch! :) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599301#comment-14599301 ] Nishant Kelkar commented on HIVE-11091: --- The only significant change I see in above code snippets is: {code} srcFs = tbd.getSourcePath().getFileSystem(conf); dirs = srcFs.globStatus(tbd.getSourcePath()); {code} i.e. the way in which we get the file system handle and a list of the directories/files within the path provided. > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Priority: Blocker > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using "Load data local inapth" command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599283#comment-14599283 ] Nishant Kelkar commented on HIVE-11091: --- I did a diff of Hive 0.11 vs. Hive 0.14 for the piece of code within MoveTask that is causing this error: Hive-0.11: {code} Table table = db.getTable(tbd.getTable().getTableName()); if (work.getCheckFileFormat()) { // Get all files from the src directory FileStatus[] dirs; ArrayList files; FileSystem fs; try { fs = FileSystem.get(table.getDataLocation(), conf); dirs = fs.globStatus(new Path(tbd.getSourceDir())); files = new ArrayList(); for (int i = 0; (dirs != null && i < dirs.length); i++) { files.addAll(Arrays.asList(fs.listStatus(dirs[i].getPath(; // We only check one file, so exit the loop when we have at least // one. if (files.size() > 0) { break; } } } catch (IOException e) { throw new HiveException( "addFiles: filesystem error in check phase", e); } if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVECHECKFILEFORMAT)) { // Check if the file format of the file matches that of the table. boolean flag = HiveFileFormatUtils.checkInputFormat( fs, conf, tbd.getTable().getInputFileFormatClass(), files); if (!flag) { throw new HiveException( "Wrong file format. Please check the file's format."); } } } {code} Hive-0.14: {code} Table table = db.getTable(tbd.getTable().getTableName()); if (work.getCheckFileFormat()) { // Get all files from the src directory FileStatus[] dirs; ArrayList files; FileSystem srcFs; // source filesystem try { srcFs = tbd.getSourcePath().getFileSystem(conf); dirs = srcFs.globStatus(tbd.getSourcePath()); files = new ArrayList(); for (int i = 0; (dirs != null && i < dirs.length); i++) { files.addAll(Arrays.asList(srcFs.listStatus(dirs[i].getPath(), FileUtils.HIDDEN_FILES_PATH_FILTER))); // We only check one file, so exit the loop when we have at least // one. if (files.size() > 0) { break; } } } catch (IOException e) { throw new HiveException( "addFiles: filesystem error in check phase", e); } if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVECHECKFILEFORMAT)) { // Check if the file format of the file matches that of the table. boolean flag = HiveFileFormatUtils.checkInputFormat( srcFs, conf, tbd.getTable().getInputFileFormatClass(), files); if (!flag) { throw new HiveException( "Wrong file format. Please check the file's format."); } } } {code} > Unable to load data into hive table using "Load data local inapth" command > from unix named pipe > --- > > Key: HIVE-11091 > URL: https://issues.apache.org/jira/browse/HIVE-11091 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0 > Environment: Unix,MacOS >Reporter: Manoranjan Sahoo >Priority: Blocker > > Unable to load data into hive table from unix named pipe in Hive 0.14.0 > Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): > > $ mkfifo /tmp/test.txt > $ hive > hive> create table test(id bigint,name string); > OK > Time taken: 1.018 seconds > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Loading data to table default.test > Failed with exception addFiles: filesystem error in check phase > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > But in Hadoop 1.3 and hive 0.11.0 it works fine: > hive> LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; > Copying data from file:/tmp/test.txt > Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599265#comment-14599265 ] Nishant Kelkar commented on HIVE-9557: -- Hey [~kinow] and [~apivovarov], I've added a patch for the cosine similarity metric UDF and some test cases. This is my first time submitting a patch, so I guess I'm allowed 1 chance at the following question? :) What are all the next steps in this process, once a patch has been uploaded? I could also add this correspondence in an email to d...@hive.apache.org, for everyone else's benefit. Thanks! > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: udf_cosine_similarity-v01.patch > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar reassigned HIVE-9557: Assignee: Nishant Kelkar (was: Alexander Pivovarov) > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Nishant Kelkar > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598962#comment-14598962 ] Nishant Kelkar commented on HIVE-9557: -- I can volunteer for creating a patch for this task (this would be my first patch ever!). If someone could point me to the place where I am to create this class and it's tests, I could upload a patch. Thank you! > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598936#comment-14598936 ] Nishant Kelkar commented on HIVE-9557: -- [~apivovarov]: The reference implementation link you've provided seems to be broken. Did you mean to point here? -- https://github.com/Simmetrics/simmetrics/blob/master/simmetrics-core/src/main/java/org/simmetrics/metrics/CosineSimilarity.java > create UDF to measure strings similarity using Cosine Similarity algo > - > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Alexander Pivovarov >Assignee: Alexander Pivovarov > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)