Re: Dynamic partitioned parquet tables

2015-10-09 Thread Slava Markeyev
When hive.optimize.sort.dynamic.partition is off hive opens a file writer
for each new partition key as it is encountered and writes records to those
appropriate files. Since the parquet writer buffers writes in memory before
flushing to disk it can lead to OOMs when you have lots of partitions/open
files. hive.optimize.sort.dynamic.partition sorts the records based on the
partition key before starting writing. This means that all records for a
partition are written in a contiguous chunk before opening a new file and
writing that partition.

The issue you're encountering is partition creation on the metastore is
slow. I think that's a fact that isn't avoidable at the moment. I provided
a patch (see HIVE-10385) but it's not for everyone. Since your size per
partition is so small I'd recommend not partitioning by day and simply
making it a column. For queries that span months or years you'll probably
spend more time on listing files and getting partitions during query
planning than actually scanning your data.

-Slava

On Fri, Oct 9, 2015 at 4:12 PM, Yogesh Keshetty  wrote:

>
>  Any one tried this? Please help me if you have any knowledge on this kind
> of use case.
>
>
> --
> From: yogesh.keshe...@outlook.com
> To: user@hive.apache.org
> Subject: Dynamic partitioned parquet tables
> Date: Fri, 9 Oct 2015 11:20:57 +0530
>
>
>  Hello,
>
> I have a question regarding parquet tables. We have POS data, we want to
> store the data on per day partition basis.  We sqoop the data into an
> external table which is in text file format and then try to insert into an
> external table which is partitioned by date and, due to some
> requirements, we wanted to keep these files as parquet files. The average
> file size per day is around 2 MB. I know that parquet is not meant to be
> for lot of small files. But, we wanted to keep it that way. The problem is
> during the initial historical data load we are trying to create dynamic
> partitions, however no matter how much memory I set the jobs keeps failing
> because of memory issues. But after some research I found out that
> turning ,"set hive.optimize.sort.dynamic.partition = true", this property
> on we could create dynamic partitioned tables. But this is taking longer
> time than what we expected, is there anyway that we can boost the
> performance? Also, in spite of turning the property on when we try to
> create dynamic partitions for multiple years data at a time we are again
> running into heap error. How can we handle this problem? Please help us.
>
> Thanks in advance!
>
> Thank you,
> Yogesh
>



-- 

Slava Markeyev | Engineering | Upsight
Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


nested join issue

2015-06-11 Thread Slava Markeyev
I'm running into a peculiar issue with nested joins and outer select. I see
this error on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression.

The following query produces no results:

select sfrom (
  select last.*, action.st2, action.n
  from (
select purchase.s, purchase.timestamp, max (mevt.timestamp) as
last_stage_timestamp
from (select * from purchase_history) purchase
join (select * from cart_history) mevt
on purchase.s = mevt.s
where purchase.timestamp > mevt.timestamp
group by purchase.s, purchase.timestamp
  ) last
  join (select * from events) action
  on last.s = action.s and last.last_stage_timestamp = action.timestamp
) list;

While this one does produce results

select *from (
  select last.*, action.st2, action.n
  from (
select purchase.s, purchase.timestamp, max (mevt.timestamp) as
last_stage_timestamp
from (select * from purchase_history) purchase
join (select * from cart_history) mevt
on purchase.s = mevt.s
where purchase.timestamp > mevt.timestamp
group by purchase.s, purchase.timestamp
  ) last
  join (select * from events) action
  on last.s = action.s and last.last_stage_timestamp = action.timestamp
) list;

1 21 20 Bob 1234
1 31 30 Bob 1234
3 51 50 Jeff 1234

The setup to test this is:

create table purchase_history (s string, product string, price double,
timestamp int);
insert into purchase_history values ('1', 'Belt', 20.00, 21);
insert into purchase_history values ('1', 'Socks', 3.50, 31);
insert into purchase_history values ('3', 'Belt', 20.00, 51);
insert into purchase_history values ('4', 'Shirt', 15.50, 59);

create table cart_history (s string, cart_id int, timestamp int);
insert into cart_history values ('1', 1, 10);
insert into cart_history values ('1', 2, 20);
insert into cart_history values ('1', 3, 30);
insert into cart_history values ('1', 4, 40);
insert into cart_history values ('3', 5, 50);
insert into cart_history values ('4', 6, 60);

create table events (s string, st2 string, n int, timestamp int);
insert into events values ('1', 'Bob', 1234, 20);
insert into events values ('1', 'Bob', 1234, 30);
insert into events values ('1', 'Bob', 1234, 25);
insert into events values ('2', 'Sam', 1234, 30);
insert into events values ('3', 'Jeff', 1234, 50);
insert into events values ('4', 'Ted', 1234, 60);

I realize select * and select s are not all that interesting in this
context but what lead me to this issue was select count(distinct s) was not
returning results. The above queries are the simplified queries that
produce the issue. I will note that if I convert the inner join to a table
and select from that the issue does not appear.

-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: Very slow dynamic partition load

2015-06-11 Thread Slava Markeyev
This is something that a few of us have run into. I think the bottleneck is
in partition creation calls to the metastore. My work around was HIVE-10385
which optionally removed partition creation in the metastore but this isn't
a solution for everyone. If you don't require actual partitions in the
table but simply partitioned data in hdfs give it a shot. It may be
worthwhile looking into optimizations for this use case.

-Slava

On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota 
wrote:

> Hi All,
>
> I have a table which is partitioned on two columns (customer, date). I'm
> loading some data into the table using a Hive query. The MapReduce job
> completed within a few minutes and needs to "commit" the data to the
> appropriate partitions. There were about 32000 partitions generated. The
> commit phase has been running for almost 16 hours and has not finished yet.
> I've been monitoring jmap, and don't believe it's a memory or gc issue.
> I've also been looking at jstack and not sure why it's so slow. I'm not
> sure what the problem is, but seems to be a Hive performance issue when it
> comes to "highly partitioned" tables.
>
> Any thoughts on this issue would be greatly appreciated.
>
> Thanks in advance,
> Pradeep
>



-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: Hive 1.2.0 Unable to start metastore

2015-06-08 Thread Slava Markeyev
Sounds like you ran into this:
https://issues.apache.org/jira/browse/HIVE-9198

On Mon, Jun 8, 2015 at 1:06 PM, James Pirz  wrote:

> Thanks !
> There was a similar problem: Conflicting Jars, but between Hive and Spark.
> My eventual goal is running Spark with Hive's tables, and having Spark's
> libraries on my path as well, there were conflicting Jar files.
> I removed Spark libraries from my PATH and Hive's services (remote
> metastore) just started all well.
> For now I am good, but I am just wondering what is the correct way to fix
> this ? Once I wanna start Spark, I need to include its libraries to the
> PATH, and the conflicts seems inevitable.
>
>
>
> On Mon, Jun 8, 2015 at 12:09 PM, Slava Markeyev <
> slava.marke...@upsight.com> wrote:
>
>> It sounds like you are running into a jar conflict between the hive
>> packaged derby and hadoop distro packaged derby. Look for derby jars on
>> your system to confirm.
>>
>> In the mean time try adding this to your hive-env.sh or hadoop-env.sh
>> file:
>>
>> export HADOOP_USER_CLASSPATH_FIRST=true
>>
>> On Mon, Jun 8, 2015 at 11:52 AM, James Pirz  wrote:
>>
>>> I am trying to run Hive 1.2.0 on Hadoop 2.6.0 (on a cluster, running
>>> CentOS). I am able to start Hive CLI and run queries. But once I try to
>>> start Hive's metastore (I trying to use the builtin derby) using:
>>>
>>> hive --service metastore
>>>
>>> I keep getting Class Not Found Exceptions for
>>> "org.apache.derby.jdbc.EmbeddedDriver" (See below).
>>>
>>> I have exported $HIVE_HOME and added $HIVE_HOME/bin and $HIVE_HOME/lib
>>> to the $PATH, and I see that there is "derby-10.11.1.1.jar" file under
>>> $HIVE_HOME/lib .
>>>
>>> In my hive-site.xml (under $HIVE_HOME/conf) I have:
>>>
>>> 
>>> javax.jdo.option.ConnectionDriverName
>>> org.apache.derby.jdbc.EmbeddedDriver
>>> Driver class name for a JDBC metastore
>>>   
>>>
>>> 
>>> javax.jdo.option.ConnectionURL
>>> jdbc:derby:;databaseName=metastore_db;create=true
>>> JDBC connect string for a JDBC metastore
>>>   
>>>
>>> So I am not sure, why it can not find it.
>>> Any suggestion or hint would be highly appreciated.
>>>
>>>
>>> Here is the error:
>>>
>>> javax.jdo.JDOFatalInternalException: Error creating transactional
>>> connection factory
>>> ...
>>> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.derby.jdbc.EmbeddedDriver
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at java.lang.Class.newInstance(Class.java:379)
>>> at
>>> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>>> at
>>> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>>> at
>>> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>>> at
>>> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>>> at
>>> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>>>
>>>
>>
>>
>> --
>>
>> Slava Markeyev | Engineering | Upsight
>>
>> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
>> <http://www.linkedin.com/in/slavamarkeyev>
>>
>
>


-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: Hive 1.2.0 Unable to start metastore

2015-06-08 Thread Slava Markeyev
It sounds like you are running into a jar conflict between the hive
packaged derby and hadoop distro packaged derby. Look for derby jars on
your system to confirm.

In the mean time try adding this to your hive-env.sh or hadoop-env.sh file:

export HADOOP_USER_CLASSPATH_FIRST=true

On Mon, Jun 8, 2015 at 11:52 AM, James Pirz  wrote:

> I am trying to run Hive 1.2.0 on Hadoop 2.6.0 (on a cluster, running
> CentOS). I am able to start Hive CLI and run queries. But once I try to
> start Hive's metastore (I trying to use the builtin derby) using:
>
> hive --service metastore
>
> I keep getting Class Not Found Exceptions for
> "org.apache.derby.jdbc.EmbeddedDriver" (See below).
>
> I have exported $HIVE_HOME and added $HIVE_HOME/bin and $HIVE_HOME/lib to
> the $PATH, and I see that there is "derby-10.11.1.1.jar" file under
> $HIVE_HOME/lib .
>
> In my hive-site.xml (under $HIVE_HOME/conf) I have:
>
> 
> javax.jdo.option.ConnectionDriverName
> org.apache.derby.jdbc.EmbeddedDriver
> Driver class name for a JDBC metastore
>   
>
> 
> javax.jdo.option.ConnectionURL
> jdbc:derby:;databaseName=metastore_db;create=true
> JDBC connect string for a JDBC metastore
>   
>
> So I am not sure, why it can not find it.
> Any suggestion or hint would be highly appreciated.
>
>
> Here is the error:
>
> javax.jdo.JDOFatalInternalException: Error creating transactional
> connection factory
> ...
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.derby.jdbc.EmbeddedDriver
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at java.lang.Class.newInstance(Class.java:379)
> at
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
> at
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
> at
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
> at
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
> at
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>
>


-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: Hive 1.2.0 fails on Hadoop 2.6.0

2015-06-06 Thread Slava Markeyev
What's the rest of that stack trace? Do you see a NoClassDefFoundError?

-Slava

On Fri, Jun 5, 2015 at 7:28 PM, James Pirz  wrote:

> I am trying to run Apache Hive 1.2.0 on Hadoop 2.6.0 on a cluster. My
> hadoop cluster comes up fine (I start hdfs and yarn) and then I create
> required tmp and warehouse directories in HDFS and I try to start Hive CLI
> (I do not do anything with HCatalog or Hiveserver2) but I keep getting
> errors related to metastore (See below). Replacing Hive 1.2.0 with Hive
> 0.13, it just works fine.
>
> Is there anything changed regarding starting Hive 1.X on Hadoop 2.X from
> Hive 0.X ? (This is the first time I am trying Hive on Hadoop 2)
>
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:519)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
> ….
>



-- 

Slava Markeyev | Engineering | Upsight

Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-17 Thread Slava Markeyev
I've created HIVE-10385 and attached a patch. Unit tests to come.

-Slava

On Fri, Apr 17, 2015 at 1:34 PM, Chris Roblee  wrote:

> Hi Slava,
>
> We would be interested in reviewing your patch.  Can you please provide
> more details?
>
> Is there any other way to disable the partition creation step?
>
> Thanks,
> Chris
>
> On 4/13/15 10:59 PM, Slava Markeyev wrote:
>
>> This is something I've encountered when doing ETL with hive and having it
>> create 10's of thousands partitions. The issue
>> is each partition needs to be added to the metastore and this is an
>> expensive operation to perform. My work around was
>> adding a flag to hive that optionally disables the metastore partition
>> creation step. This may not be a solution for
>> everyone as that table then has no partitions and you would have to run
>> msck repair but depending on your use case, you
>> may just want the data in hdfs.
>>
>> If there is interest in having this be an option I'll make a ticket and
>> submit the patch.
>>
>> -Slava
>>
>> On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A > <mailto:cheng.a...@intel.com>> wrote:
>>
>> Hi Tianqi,
>>
>> Can you attach hive.log as more detailed information?
>>
>> +Sergio
>>
>> __ __
>>
>> Yours,
>>
>> Ferdinand Xu
>>
>> __ __
>>
>> *From:*Tianqi Tong [mailto:tt...@brightedge.com > tt...@brightedge.com>]
>> *Sent:* Friday, April 10, 2015 1:34 AM
>> *To:* user@hive.apache.org <mailto:user@hive.apache.org>
>> *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k
>> Partitions
>>
>> __ __
>>
>> Hello Hive,
>>
>> I'm a developer using Hive to process TB level data, and I'm having
>> some difficulty loading the data to table.
>>
>> I have 2 tables now:
>>
>> __ __
>>
>> -- table_1:
>>
>> CREATE EXTERNAL TABLE `table_1`(
>>
>>`keyword` string,
>>
>>`domain` string,
>>
>>`url` string
>>
>>)
>>
>> PARTITIONED BY (yearmonth INT, partition1 STRING)
>>
>> STORED AS RCfile
>>
>> __ __
>>
>> -- table_2:
>>
>> CREATE EXTERNAL TABLE `table_2`(
>>
>>`keyword` string,
>>
>>`domain` string,
>>
>>`url` string
>>
>>)
>>
>> PARTITIONED BY (yearmonth INT, partition2 STRING)
>>
>> STORED AS Parquet
>>
>> __ __
>>
>> I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1
>> with dynamic partitioning, and the number of
>> partitions grows dramatically from 1500 to 40k (because I want to use
>> something else as partitioning).
>>
>> The mapreduce job was fine.
>>
>> Somehow the process stucked at " Loading data to table
>> default.table_2 (yearmonth=null, domain_prefix=null) ", and
>> I've been waiting for hours.
>>
>> __ __
>>
>> Is this expected when we have 40k partitions?
>>
>> __ __
>>
>> ------
>>
>> Refs - Here are the parameters that I used:
>>
>> export HADOOP_HEAPSIZE=16384
>>
>> set PARQUET_FILE_SIZE=268435456;
>>
>> set parquet.block.size=268435456;
>>
>> set dfs.blocksize=268435456;
>>
>> set parquet.compression=SNAPPY;
>>
>> SET hive.exec.dynamic.partition.mode=nonstrict;
>>
>> SET hive.exec.max.dynamic.partitions=50;
>>
>> SET hive.exec.max.dynamic.partitions.pernode=5;
>>
>> SET hive.exec.max.created.files=100;
>>
>> __ __
>>
>> __ __
>>
>> Thank you very much!
>>
>> Tianqi Tong
>>
>>
>>
>>
>> --
>>
>> Slava Markeyev | Engineering | Upsight
>>
>>
>


-- 

Slava Markeyev | Engineering | Upsight
<http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: [Hive] Slow Loading Data Process with Parquet over 30k Partitions

2015-04-13 Thread Slava Markeyev
This is something I've encountered when doing ETL with hive and having it
create 10's of thousands partitions. The issue is each partition needs to
be added to the metastore and this is an expensive operation to perform. My
work around was adding a flag to hive that optionally disables the
metastore partition creation step. This may not be a solution for everyone
as that table then has no partitions and you would have to run msck repair
but depending on your use case, you may just want the data in hdfs.

If there is interest in having this be an option I'll make a ticket and
submit the patch.

-Slava

On Mon, Apr 13, 2015 at 10:40 PM, Xu, Cheng A  wrote:

>  Hi Tianqi,
>
> Can you attach hive.log as more detailed information?
>
> +Sergio
>
>
>
> Yours,
>
> Ferdinand Xu
>
>
>
> *From:* Tianqi Tong [mailto:tt...@brightedge.com]
> *Sent:* Friday, April 10, 2015 1:34 AM
> *To:* user@hive.apache.org
> *Subject:* [Hive] Slow Loading Data Process with Parquet over 30k
> Partitions
>
>
>
> Hello Hive,
>
> I'm a developer using Hive to process TB level data, and I'm having some
> difficulty loading the data to table.
>
> I have 2 tables now:
>
>
>
> -- table_1:
>
> CREATE EXTERNAL TABLE `table_1`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition1 STRING)
>
> STORED AS RCfile
>
>
>
> -- table_2:
>
> CREATE EXTERNAL TABLE `table_2`(
>
>   `keyword` string,
>
>   `domain` string,
>
>   `url` string
>
>   )
>
> PARTITIONED BY (yearmonth INT, partition2 STRING)
>
> STORED AS Parquet
>
>
>
> I'm doing an INSERT OVERWRITE to table_2 from SELECT FROM table_1 with
> dynamic partitioning, and the number of partitions grows dramatically from
> 1500 to 40k (because I want to use something else as partitioning).
>
> The mapreduce job was fine.
>
> Somehow the process stucked at " Loading data to table default.table_2
> (yearmonth=null, domain_prefix=null) ", and I've been waiting for hours.
>
>
>
> Is this expected when we have 40k partitions?
>
>
>
> --
>
> Refs - Here are the parameters that I used:
>
> export HADOOP_HEAPSIZE=16384
>
> set PARQUET_FILE_SIZE=268435456;
>
> set parquet.block.size=268435456;
>
> set dfs.blocksize=268435456;
>
> set parquet.compression=SNAPPY;
>
> SET hive.exec.dynamic.partition.mode=nonstrict;
>
> SET hive.exec.max.dynamic.partitions=50;
>
> SET hive.exec.max.dynamic.partitions.pernode=5;
>
> SET hive.exec.max.created.files=100;
>
>
>
>
>
> Thank you very much!
>
> Tianqi Tong
>



-- 

Slava Markeyev | Engineering | Upsight


Re: rename a database

2015-03-27 Thread Slava Markeyev
Just to note the current patch for HIVE-4847
<https://issues.apache.org/jira/browse/HIVE-4847> doesn't handle errors
really well and you can potentially get in an inconsistent state if there
is a failure along the way. Also, iirc external tables aren't handled
properly either.

-Slava

On Fri, Mar 27, 2015 at 9:14 AM, @Sanjiv Singh 
wrote:

> There is already JIRA raised for this functionality and patch available
> with ticket.
>
> Patch moves the database directory on HDFS and changes its related
> metadata entities.
> Unit tests are also included.
>
> https://issues.apache.org/jira/browse/HIVE-4847
>
> I have not tried it yet.
>
> Regards,
> Sanjiv Singh
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 9:31 PM, Dr Mich Talebzadeh 
> wrote:
>
>>
>> yep  it can can happen  in any server hosting a database or schema.
>>
>> I believe you will need to rename the database directory underr hive
>> warehouse ditectory .db.
>>
>> Then you can hack  Hive metasttore database. Mine is on SAP ASE. Certain
>> tables like DBS etc in metastoreDB need to be changed. I cannot
>> fremember on top of my head. Best to basckup database before hacking it
>> and do it out of business hours.
>>
>> That is an approch. I am sure there are better ways.
>>
>> HTH,
>>
>> Mich
>>
>> On 27/3/2015, "Fabio C."  wrote:
>>
>> >Maybe they just typed time_shit instead of time_shift and found it out
>> >after 3 hours of tables compression... I don't think it's too important,
>> >but which is the workaround? I'm also interested in this.
>> >Maybe it's just a matter of metastore and one could try to explore the
>> >metastore db to change how the database is referenced, possibly having to
>> >change also the DB folder name on HDFS... My two cents ;)
>> >
>> >Regards
>> >
>> >On Fri, Mar 27, 2015 at 3:10 PM, @Sanjiv Singh 
>> >wrote:
>> >
>> >> Can I know why do you want to do so?
>> >>
>> >> Currently There is no command or direct way to do that..then I can
>> suggest
>> >> workaround for this.
>> >>
>> >> Thanks,
>> >> Sanjiv Singh
>> >>
>> >>
>> >> Regards
>> >> Sanjiv Singh
>> >> Mob :  +091 9990-447-339
>> >>
>> >> On Wed, Mar 25, 2015 at 10:01 AM, Shushant Arora <
>> >> shushantaror...@gmail.com> wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> Is there any way in hive0.10 to rename a database ?
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>
>> >
>> >
>>
>
>


-- 

Slava Markeyev | Engineering | Upsight
<http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: CSV file reading in hive

2015-02-12 Thread Slava Markeyev
You can use lazy simple serde with ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' ESCAPED BY '\'. Check the DDL for details
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL



On Thu, Feb 12, 2015 at 8:19 PM, Sreeman  wrote:

>  Hi All,
>
> How all of you are creating hive/Impala table when the CSV file has some
> values with COMMA in between. it is like
>
> sree,12345,"payment made,but it is not successful"
>
>
>
>
>
> I know opencsv serde is there but it is not available in lower versions of
> Hive 14.0
>
>
>



-- 

Slava Markeyev | Engineering | Upsight
Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>


Re: Hive Insert overwrite creating a single file with large block size

2015-01-09 Thread Slava Markeyev
You can control block size by setting dfs.block.size. However, I think you
might be asking how to control the size of and number of files generated on
insert. Is that correct?

On Fri, Jan 9, 2015 at 4:41 PM, Buntu Dev  wrote:

> I got a bunch of small Avro files (<5MB) and have a table against those
> files. I created a new table and did an 'INSERT OVERWRITE' selecting from
> the existing table but did not find any option to provide the file block
> size. It currently creates a single file per partition.
>
> How do I specify the output block size during the 'INSERT OVERWRITE'?
>
> Thanks!
>



-- 

Slava Markeyev | Engineering | Upsight
<http://www.linkedin.com/in/slavamarkeyev>
<http://www.linkedin.com/in/slavamarkeyev>