So evidentially LOAD DATA actually just copies a file to hdfs. What is the solution if you have thousands of files and attempt a hive query because my understanding is that this will be dead slow later.
Suhail On Sun, Apr 5, 2009 at 10:52 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote: > Ragu, > > I managed to get it working, seems there was just inconsistencies I guess > with metastore_db I was using in the client and the python one. > > I should just always use python from now on to make changes to > metastore_db, instead of copying it around and using the hive client. > > Suhail > > > On Sun, Apr 5, 2009 at 10:44 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote: > >> Oh nevermind, of course python is using the metastore_db that the hive >> service is using. >> >> Suhail >> >> >> On Sun, Apr 5, 2009 at 10:42 AM, Suhail Doshi >> <digitalwarf...@gmail.com>wrote: >> >>> This is kind of odd, it's like it's not using the same metastore_db: >>> >>> li57-125 ~/test: ls >>> derby.log hive_test.py hive_test.pyc metastore_db page_view.log.2 >>> >>> li57-125 ~/test: hive >>> Hive history >>> file=/tmp/hadoop/hive_job_log_hadoop_200904051740_1405686854.txt >>> hive> select count(1) from page_views; >>> Total MapReduce jobs = 1 >>> Number of reduce tasks determined at compile time: 1 >>> In order to change the average load for a reducer (in bytes): >>> set hive.exec.reducers.bytes.per.reducer=<number> >>> In order to limit the maximum number of reducers: >>> set hive.exec.reducers.max=<number> >>> In order to set a constant number of reducers: >>> set mapred.reduce.tasks=<number> >>> Job need not be submitted: no output: Success >>> OK >>> Time taken: 4.909 seconds >>> >>> li57-125 ~/test: python hive_test.py >>> Connecting to HiveServer.... >>> Opening transport... >>> select count(1) from page_views >>> Number of rows: ['20297'] >>> >>> >>> >>> On Sat, Apr 4, 2009 at 11:02 PM, Suhail Doshi >>> <digitalwarf...@gmail.com>wrote: >>> >>>> No logs are generated when I run the python file in /tmp/hadoop/ >>>> >>>> Suhail >>>> >>>> >>>> On Sat, Apr 4, 2009 at 10:38 PM, Raghu Murthy <rmur...@facebook.com>wrote: >>>> >>>>> Is there no entry in the server logs about the error? >>>>> >>>>> >>>>> On 4/4/09 10:24 PM, "Suhail Doshi" <digitalwarf...@gmail.com> wrote: >>>>> >>>>> > I am running the hive server and hadoop on the same server as the >>>>> file. I am >>>>> > also running the python script and hive server under the same user >>>>> and the >>>>> > file is located in a directory this user owns. >>>>> > >>>>> > I am not sure why it's not loading it still. >>>>> > >>>>> > Suhail >>>>> > >>>>> > On Sat, Apr 4, 2009 at 10:14 PM, Raghu Murthy <rmur...@facebook.com> >>>>> wrote: >>>>> >> Is the file accessible to the HiveServer? We currently don't ship >>>>> the file >>>>> >> from the client machine to the server machine. >>>>> >> >>>>> >> >>>>> >> On 4/3/09 10:26 PM, "Suhail Doshi" <suh...@mixpanel.com> wrote: >>>>> >> >>>>> >>>> I seem to be having problems with LOAD DATA with a file on my >>>>> local system >>>>> >>>> trying get it into hive: >>>>> >>>> >>>>> >>>> li57-125 ~/test: python hive_test.py >>>>> >>>> Connecting to HiveServer.... >>>>> >>>> Opening transport... >>>>> >>>> LOAD DATA LOCAL INPATH '/home/hadoop/test/page_view.log.2' INTO >>>>> TABLE >>>>> >>>> page_views >>>>> >>>> Traceback (most recent call last): >>>>> >>>> File "hive_test.py", line 36, in <module> >>>>> >>>> c.client.execute(query) >>>>> >>>> File >>>>> "/home/hadoop/hive/build/dist/lib/py/hive_service/ThriftHive.py", >>>>> >>> line >>>>> >>>> 42, in execute >>>>> >>>> self.recv_execute() >>>>> >>>> File >>>>> "/home/hadoop/hive/build/dist/lib/py/hive_service/ThriftHive.py", >>>>> >>> line >>>>> >>>> 63, in recv_execute >>>>> >>>> raise result.ex >>>>> >>>> hive_service.ttypes.HiveServerException: {} >>>>> >>>> >>>>> >>>> The same query works fine through the hive client but doesn't seem >>>>> to work >>>>> >>>> through the python file. Executing a query through the python >>>>> client works >>>>> >>>> fine if it's not a LOAD DATA. Unfortunately, I wish there was a >>>>> better >>>>> >>> message >>>>> >>>> to describe why the exception is occurring. >>>>> >> >>>>> > >>>>> > >>>>> >>>>> >>>> >>>> >>>> -- >>>> http://mixpanel.com >>>> Blog: http://blog.mixpanel.com >>>> >>> >>> >>> >>> -- >>> http://mixpanel.com >>> Blog: http://blog.mixpanel.com >>> >> >> >> >> -- >> http://mixpanel.com >> Blog: http://blog.mixpanel.com >> > > > > -- > http://mixpanel.com > Blog: http://blog.mixpanel.com > -- http://mixpanel.com Blog: http://blog.mixpanel.com