Re: HIVE and S3 folders
Hi Mark, I can understand if EMR was the only thing that could recognize it. It appears that s3cmd (a utility used to copy files to S3) also recognizes the files created by EMR or create them and have them read them by EMR. When I look at the debug information, HIVE seems to be sending an extra "/" when creating a table Here is a debug message and if you see the path, there is a "/" and a "%2f". Probably a bug in the code ? hive> create external table wc(site string, cnt int) location 's3://masked/wcoverlay/'; GETWed, 07 Mar 2012 18:26:03 GMT/masked/%2Fwcoverlay. On Wed, Mar 7, 2012 at 12:56 PM, Mark Grover wrote: > Hi Balaji, > The Hive/Hadoop installation that comes with EMR is Amazon specific which has > some additional patches that make s3 paths as recognizable as HDFS paths. > > However, if you are using EC2, you most likely have Apache or Cloudera > installation which doesn't recognize S3 paths. > > Mark > > Mark Grover, Business Intelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > > "Best Trading Platform" - World Finance's Forex Awards 2009. > "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > ----- Original Message - > From: "Balaji Rao" > To: user@hive.apache.org > Sent: Wednesday, March 7, 2012 12:48:31 PM > Subject: HIVE and S3 folders > > I'm having problems with HIVE- EC2 reading files on S3. > > I have a lot of files and folders on S3 created by s3cmd and utilized > by Elastic Map Reduce (HIVE) and they work interchangeably, files > created by HIVE-EMR can be read by s3cmd and vice versa. > However, I'm having problems with HIVE/Hadoop running on EC2. Both > Hive 0.7 and 0.8 seem to create an additional folder "/" on S3 > > For example, if I have a file s3://bucket/path/0 created by s3cmd > or HIVE-EMR and I try to create an external table on HIVE- EC2 > > create external table wc(site string, cnt int) row format delimited > fields terminated by '\t' stored as textfile location > 's3://bucket/path' > > This does not recognize the EMR created s3 folders, instead I see a > new folder "/" > > / "/" / path > > Am I missing something here ? > > > Balaji
Re: HIVE and S3 folders
Hi Balaji, The Hive/Hadoop installation that comes with EMR is Amazon specific which has some additional patches that make s3 paths as recognizable as HDFS paths. However, if you are using EC2, you most likely have Apache or Cloudera installation which doesn't recognize S3 paths. Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. - Original Message - From: "Balaji Rao" To: user@hive.apache.org Sent: Wednesday, March 7, 2012 12:48:31 PM Subject: HIVE and S3 folders I'm having problems with HIVE- EC2 reading files on S3. I have a lot of files and folders on S3 created by s3cmd and utilized by Elastic Map Reduce (HIVE) and they work interchangeably, files created by HIVE-EMR can be read by s3cmd and vice versa. However, I'm having problems with HIVE/Hadoop running on EC2. Both Hive 0.7 and 0.8 seem to create an additional folder "/" on S3 For example, if I have a file s3://bucket/path/0 created by s3cmd or HIVE-EMR and I try to create an external table on HIVE- EC2 create external table wc(site string, cnt int) row format delimited fields terminated by '\t' stored as textfile location 's3://bucket/path' This does not recognize the EMR created s3 folders, instead I see a new folder "/" / "/" / path Am I missing something here ? Balaji
HIVE and S3 folders
I'm having problems with HIVE- EC2 reading files on S3. I have a lot of files and folders on S3 created by s3cmd and utilized by Elastic Map Reduce (HIVE) and they work interchangeably, files created by HIVE-EMR can be read by s3cmd and vice versa. However, I'm having problems with HIVE/Hadoop running on EC2. Both Hive 0.7 and 0.8 seem to create an additional folder "/" on S3 For example, if I have a file s3://bucket/path/0 created by s3cmd or HIVE-EMR and I try to create an external table on HIVE- EC2 create external table wc(site string, cnt int) row format delimited fields terminated by '\t' stored as textfile location 's3://bucket/path' This does not recognize the EMR created s3 folders, instead I see a new folder "/" / "/" / path Am I missing something here ? Balaji