RE: Hive double-precision question

2012-12-07 Thread Lauren Yang
This sounds like https://issues.apache.org/jira/browse/HIVE-2586 , where 
comparing float/doubles will not work because of the way floating point numbers 
are represented.

Perhaps there is a comparison between a  float and double type because of some 
internal representation in the Java library, or the UDF.

Ed Capriolo's book has a good section about workarounds and caveats for working 
with floats/doubles in hive.

Thanks,
Lauren
From: Periya.Data [mailto:periya.d...@gmail.com]
Sent: Friday, December 07, 2012 1:28 PM
To: user@hive.apache.org; cdh-u...@cloudera.org
Subject: Hive double-precision question

Hi Hive Users,
I recently noticed an interesting behavior with Hive and I am unable to 
find the reason for it. Your insights into this is much appreciated.

I am trying to compute the distance between two zip codes. I have the distances 
computed in various 'platforms' - SAS, R, Linux+Java, Hive UDF and using Hive's 
built-in functions. There are some discrepancies from the 3rd decimal place 
when I see the output got from using Hive UDF and Hive's built-in functions. 
Here is an example:

zip1  zip 2  Hadoop Built-in functionSAS
  R   Linux + Java
00501

11720

4.49493083698542000

4.49508858

4.49508858054005

4.49508857976933000


The formula used to compute distance is this (UDF):

double long1 = Math.atan(1)/45 * ux;
double lat1 = Math.atan(1)/45 * uy;
double long2 = Math.atan(1)/45 * mx;
double lat2 = Math.atan(1)/45 * my;

double X1 = long1;
double Y1 = lat1;
double X2 = long2;
double Y2 = lat2;

double distance = 3949.99 * Math.acos(Math.sin(Y1) *
Math.sin(Y2) + Math.cos(Y1) * Math.cos(Y2) * Math.cos(X1 - X2));


The one used using built-in functions (same as above):
3949.99*acos(  sin(u_y_coord * (atan(1)/45 )) *
sin(m_y_coord * (atan(1)/45 )) + cos(u_y_coord * (atan(1)/45 ))*
cos(m_y_coord * (atan(1)/45 ))*cos(u_x_coord *
(atan(1)/45) - m_x_coord * (atan(1)/45)) )




- The Hive's built-in functions used are acos, sin, cos and atan.
- for another try, I used Hive UDF, with Java's math library (Math.acos, 
Math.atan etc)
- All variables used are double.

I expected the value from Hadoop UDF (and Built-in functions) to be identical 
with that got from plain Java code in Linux. But they are not. The built-in 
function (as well as UDF) gives 49493083698542000 whereas simple Java program 
running in Linux gives 49508857976933000. The linux machine is similar to the 
Hadoop cluster machines.

Linux version - Red Hat 5.5
Java - latest.
Hive - 0.7.1
Hadoop - 0.20.2

This discrepancy is very consistent across thousands of zip-code distances. It 
is not a one-off occurrence. In some cases, I see the difference from the 4th 
decimal place. Some more examples:

zip1  zip 2  Hadoop Built-in functionSAS
  R   Linux + Java
00602

00617

42.7909525390341

42.79072812

42.79072812185650

42.7907281218564

00603

00617

40.2404401665518

40.2402289

40.24022889740920

40.2402288974091

00605

00617

40.1919176128838

40.19186416

40.19186415807060

40.1918641580706


I have not tested the individual sin, cos, atan function returns. That will be 
my next test. But, at the very least, why is there a difference in the values 
between Hadoop's UDF/built-ins and that from Linux + Java?  I am assuming that 
Hive's built-in mathematical functions are nothing but the underlying Java 
functions.

Thanks,
PD.


RE: hive-site.xml not found on classpath

2012-11-30 Thread Lauren Yang
You can see if the classpath is being passed correctly to hadoop by putting in 
an echo statement around line 150 of the hive cli script where it passes the 
CLASSPATH variable to HADOOP_CLASSPATH.
# pass classpath to hadoop
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH}

You could also echo the classpath in the hadoop script (in your HADOOP_HOME\bin 
directory) to see if the classpath is being passed correctly to the time when 
the cli jar is invoked.

As far as the logs location, if this is not set in your hive-site.xml, you can 
set it by passing  in HIVE_OPTS when you invoke the command line.

Like so:
EXPORT HIVE_OPTS= -hiveconf hive.log.dir=$ HIVE_HOME\logs
Then run hive

Or:
Run hive --hiveconf hive.log.dir=$ HIVE_HOME\logs

Thanks,
Lauren


From: Stephen Boesch [mailto:java...@gmail.com]
Sent: Friday, November 30, 2012 12:16 AM
To: user@hive.apache.org
Subject: Re: hive-site.xml not found on classpath

running 0.9.0 (you can see it from the classpath shown below);

steve@mithril:/shared/cdh4$ echo $HIVE_CONF_DIR
/shared/hive/conf
steve@mithril:/shared/cdh4$ ls -l $HIVE_CONF_DIR
total 152
-rw-r--r-- 1 steve steve 46053 2011-12-13 00:36 hive-default.xml.template
-rw-r--r-- 1 steve steve  1615 2012-11-13 23:37 
hive-env.bullshit.shhttp://hive-env.bullshit.sh
-rw-r--r-- 1 steve steve  1671 2012-11-28 01:43 hive-env.sh
-rw-r--r-- 1 steve steve  1593 2011-12-13 00:36 hive-env.sh.template
-rw-r--r-- 1 steve steve  1637 2011-12-13 00:36 
hive-exec-log4j.properties.template
-rw-r--r-- 1 root  root   2056 2012-11-28 01:38 hive-log4j.properties
-rw-r--r-- 1 steve steve  2056 2012-03-25 12:49 hive-log4j.properties.template
-rw-r--r-- 1 steve steve  4415 2012-11-25 23:02 hive-site.xml
steve@mithril:/shared/cdh4$ echo $HIVE_HOME
/shared/hive
steve@mithril:/shared/cdh4$ echo $(which hive)
/shared/hive/bin/hive

also you can see the hive/conf is the first entry

After adding the debug statement:

classpath=/shared/hive/conf:/shared/hive/lib/antlr-runtime-3.0.1.jar:/shared/hive/lib/commons-cli-1.2.jar:/shared/hive/lib/commons-codec-1.3.jar:/shared/hive/lib/commons-collections-3.2.1.jar:/shared/hive/lib/commons-dbcp-1.4.jar:/shared/hive/lib/commons-lang-2.4.jar:/shared/hive/lib/commons-logging-1.0.4.jar:/shared/hive/lib/commons-logging-api-1.0.4.jar:/shared/hive/lib/commons-pool-1.5.4.jar:/shared/hive/lib/datanucleus-connectionpool-2.0.3.jar:/shared/hive/lib/datanucleus-core-2.0.3.jar:/shared/hive/lib/datanucleus-enhancer-2.0.3.jar:/shared/hive/lib/datanucleus-rdbms-2.0.3.jar:/shared/hive/lib/derby-10.4.2.0.jar:/shared/hive/lib/guava-r09.jar:/shared/hive/lib/hbase-0.92.0.jar:/shared/hive/lib/hbase-0.92.0-tests.jar:/shared/hive/lib/hive-builtins-0.9.0.jar:/shared/hive/lib/hive-cli-0.9.0.jar:/shared/hive/lib/hive-common-0.9.0.jar:/shared/hive/lib/hive-contrib-0.9.0.jar:/shared/hive/lib/hive_contrib.jar:/shared/hive/lib/hive-exec-0.9.0.jar:/shared/hive/lib/hive-hbase-handler-0.9.0.jar:/shared/hive/lib/hive-hwi-0.9.0.jar:/shared/hive/lib/hive-jdbc-0.9.0.jar:/shared/hive/lib/hive-metastore-0.9.0.jar:/shared/hive/lib/hive-pdk-0.9.0.jar:/shared/hive/lib/hive-serde-0.9.0.jar:/shared/hive/lib/hive-service-0.9.0.jar:/shared/hive/lib/hive-shims-0.9.0.jar:/shared/hive/lib/jackson-core-asl-1.8.8.jar:/shared/hive/lib/jackson-jaxrs-1.8.8.jar:/shared/hive/lib/jackson-mapper-asl-1.8.8.jar:/shared/hive/lib/jackson-xc-1.8.8.jar:/shared/hive/lib/JavaEWAH-0.3.2.jar:/shared/hive/lib/jdo2-api-2.3-ec.jar:/shared/hive/lib/jline-0.9.94.jar:/shared/hive/lib/json-20090211.jar:/shared/hive/lib/libfb303-0.7.0.jar:/shared/hive/lib/libfb303.jar:/shared/hive/lib/libthrift-0.7.0.jar:/shared/hive/lib/libthrift.jar:/shared/hive/lib/log4j-1.2.16.jar:/shared/hive/lib/mysql-connector-java-5.1.18-bin.jar:/shared/hive/lib/slf4j-api-1.6.1.jar:/shared/hive/lib/slf4j-log4j12-1.6.1.jar:/shared/hive/lib/stringtemplate-3.1-b1.jar:/shared/hive/lib/zookeeper-3.4.3.jar:


But even so:

  *   the log dir is still wrong (writing to /tmp/${user}/hive.log instead of 
$HIVE_HOME/logs)
  *   the following message in the log file
2012-11-30 00:12:31,775 WARN  conf.HiveConf (HiveConf.java:clinit(70)) - 
hive-site.xml not found on CLASSPATH





2012/11/30 Bing Li sarah.lib...@gmail.commailto:sarah.lib...@gmail.com
which version of hive do you use?

Could you try to add the following debug line in bin/hive before hive real 
executes, and see the result?

echo CLASSPATH=$CLASSPATH

if [ $TORUN =  ]; then
   echo Service $SERVICE not found
   echo Available Services: $SERVICE_LIST
   exit 7
else
   $TORUN $@
fi

The version I used is 0.9.0


2012/11/30 Stephen Boesch java...@gmail.commailto:java...@gmail.com
Yes i do mean the log is in the wrong location, since it was set to a 
persistent path in the $HIVE_CONF_DIR/lhive-log4j.properties.

None of the files in that directory appear to be picked up properly: neither 
the hive-site.xml nor log4j.properties.

I have put echo statements into the 'hive and the hive-config.sh  shell 
scripts and the echo statements prove