Exporting hive table data into oracle give date format error

2013-03-13 Thread Ajit Kumar Shreevastava
Hi All,

Can you please let me know how can I bypass this error. I am currently using 
Apache  SQOOP version 1.4.2.


[hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect 
jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER  --table 
BTTN_BKP_TEST --export-dir  /home/hadoop/user/hive/warehouse/bttn_bkp -P -m 1  
--input-fields-terminated-by '\0001' --verbose --input-null-string '\\N' 
--input-null-non-string '\\N'

Please set $HBASE_HOME to the root of your HBase installation.
13/03/13 18:20:42 DEBUG tool.BaseSqoopTool: Enabled debug logging.
Enter password:
13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Loaded manager factory: 
com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Trying ManagerFactory: 
com.cloudera.sqoop.manager.DefaultManagerFactory
13/03/13 18:20:47 DEBUG manager.DefaultManagerFactory: Trying with scheme: 
jdbc:oracle:thin:@10.99.42.11
13/03/13 18:20:47 DEBUG manager.OracleManager$ConnCache: Instantiated new 
connection cache.
13/03/13 18:20:47 INFO manager.SqlManager: Using default fetchSize of 1000
13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Instantiated ConnManager 
org.apache.sqoop.manager.OracleManager@74b23210
13/03/13 18:20:47 INFO tool.CodeGenTool: Beginning code generation
13/03/13 18:20:47 DEBUG manager.OracleManager: Using column names query: SELECT 
t.* FROM BTTN_BKP_TEST t WHERE 1=0
13/03/13 18:20:47 DEBUG manager.OracleManager: Creating a new connection for 
jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER
13/03/13 18:20:47 DEBUG manager.OracleManager: No connection paramenters 
specified. Using regular API for making connection.
13/03/13 18:20:47 INFO manager.OracleManager: Time zone has been set to GMT
13/03/13 18:20:47 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
13/03/13 18:20:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM BTTN_BKP_TEST t WHERE 1=0
13/03/13 18:20:47 DEBUG manager.OracleManager$ConnCache: Caching released 
connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER
13/03/13 18:20:47 DEBUG orm.ClassWriter: selected columns:
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BTTN_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   DATA_INST_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   SCR_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BTTN_NU
13/03/13 18:20:47 DEBUG orm.ClassWriter:   CAT
13/03/13 18:20:47 DEBUG orm.ClassWriter:   WDTH
13/03/13 18:20:47 DEBUG orm.ClassWriter:   HGHT
13/03/13 18:20:47 DEBUG orm.ClassWriter:   KEY_SCAN
13/03/13 18:20:47 DEBUG orm.ClassWriter:   KEY_SHFT
13/03/13 18:20:47 DEBUG orm.ClassWriter:   FRGND_CPTN_COLR
13/03/13 18:20:47 DEBUG orm.ClassWriter:   FRGND_CPTN_COLR_PRSD
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BKGD_CPTN_COLR
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BKGD_CPTN_COLR_PRSD
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BLM_FL
13/03/13 18:20:47 DEBUG orm.ClassWriter:   LCLZ_FL
13/03/13 18:20:47 DEBUG orm.ClassWriter:   MENU_ITEM_NU
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BTTN_ASGN_LVL_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   ON_ATVT
13/03/13 18:20:47 DEBUG orm.ClassWriter:   ON_CLIK
13/03/13 18:20:47 DEBUG orm.ClassWriter:   ENBL_FL
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BLM_SET_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BTTN_ASGN_LVL_NAME
13/03/13 18:20:47 DEBUG orm.ClassWriter:   MKT_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   CRTE_TS
13/03/13 18:20:47 DEBUG orm.ClassWriter:   CRTE_USER_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   UPDT_TS
13/03/13 18:20:47 DEBUG orm.ClassWriter:   UPDT_USER_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   DEL_TS
13/03/13 18:20:47 DEBUG orm.ClassWriter:   DEL_USER_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   DLTD_FL
13/03/13 18:20:47 DEBUG orm.ClassWriter:   MENU_ITEM_NA
13/03/13 18:20:47 DEBUG orm.ClassWriter:   PRD_CD
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BLM_SET_NA
13/03/13 18:20:47 DEBUG orm.ClassWriter:   SOUND_FILE_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   IS_DYNMC_BTTN
13/03/13 18:20:47 DEBUG orm.ClassWriter:   FRGND_CPTN_COLR_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   FRGND_CPTN_COLR_PRSD_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BKGD_CPTN_COLR_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter:   BKGD_CPTN_COLR_PRSD_ID
13/03/13 18:20:47 DEBUG orm.ClassWriter: Writing source file: 
/tmp/sqoop-hadoop/compile/69b6a9d2ebb99cebced808e559528531/BTTN_BKP_TEST.java
13/03/13 18:20:47 DEBUG orm.ClassWriter: Table name: BTTN_BKP_TEST
13/03/13 18:20:47 DEBUG orm.ClassWriter: Columns: BTTN_ID:2, DATA_INST_ID:2, 
SCR_ID:2, BTTN_NU:2, CAT:2, WDTH:2, HGHT:2, KEY_SCAN:2, KEY_SHFT:2, 
FRGND_CPTN_COLR:12, FRGND_CPTN_COLR_PRSD:12, BKGD_CPTN_COLR:12, 
BKGD_CPTN_COLR_PRSD:12, BLM_FL:2, LCLZ_FL:2, MENU_ITEM_NU:2, 
BTTN_ASGN_LVL_ID:2, ON_ATVT:2, ON_CLIK:2, ENBL_FL:2, BLM_SET_ID:2, 
BTTN_ASGN_LVL_NAME:12, MKT_ID:2, CRTE_TS:93, CRTE_USER_ID:12, UPDT_TS:93, 
UPDT_USER_ID:12, DEL_TS:93, DEL_USER_ID:12, DLTD_FL:2, MENU_ITEM_NA:12, 
PRD_CD:2, BLM_SET_NA:12, 

Add external jars automatically

2013-03-13 Thread Krishna Rao
Hi all,

I'm using the hive json serde and need to run: ADD JAR
/usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that
require it.

Is it possible to have this jar available automatically?

I could do it via adding the statement to a .hiverc file, but I was
wondering if there is some better way...

Cheers,

Krishna


Re: Add external jars automatically

2013-03-13 Thread Alex Kozlov
If you look into ${HIVE_HOME}/bin/hive script there are multiple ways to
add the jar.  One of my favorite, besides the .hiverc file, has been to put
the jar into ${HIVE_HOME}/auxlib dir.  There always is the
HIVE_AUX_JARS_PATH environment variable (but this introduces a dependency
on the environment).

On Wed, Mar 13, 2013 at 10:26 AM, Krishna Rao krishnanj...@gmail.comwrote:

 Hi all,

 I'm using the hive json serde and need to run: ADD JAR
 /usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that
 require it.

 Is it possible to have this jar available automatically?

 I could do it via adding the statement to a .hiverc file, but I was
 wondering if there is some better way...

 Cheers,

 Krishna



Re: Add external jars automatically

2013-03-13 Thread Krishna Rao
Ah great, the auxlib dir option sounds perfect.

Cheers


On 13 March 2013 17:41, Alex Kozlov ale...@cloudera.com wrote:

 If you look into ${HIVE_HOME}/bin/hive script there are multiple ways to
 add the jar.  One of my favorite, besides the .hiverc file, has been to put
 the jar into ${HIVE_HOME}/auxlib dir.  There always is the
 HIVE_AUX_JARS_PATH environment variable (but this introduces a dependency
 on the environment).


 On Wed, Mar 13, 2013 at 10:26 AM, Krishna Rao krishnanj...@gmail.comwrote:

 Hi all,

 I'm using the hive json serde and need to run: ADD JAR
 /usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that
 require it.

 Is it possible to have this jar available automatically?

 I could do it via adding the statement to a .hiverc file, but I was
 wondering if there is some better way...

 Cheers,

 Krishna





Introducing Parquet: efficient columnar storage for Hadoop

2013-03-13 Thread Jarek Jarcec Cecho
Fellow Hive users,
We'd like to introduce a joint project between Twitter and Cloudera engineers 
-- a new columnar storage format for Hadoop called Parquet [1]. Official 
announcement is available on Cloudera blog [2].

Parquet is designed to bring efficient columnar storage to Hadoop. Compared to, 
and learning from, the initial work done toward this goal in Trevni, Parquet 
includes the following enhancements:

* Efficiently encode nested structures and sparsely populated data based on the 
Google Dremel definition/repetition levels
* Provide extensible support for per-column encodings (e.g. delta, run length, 
etc)
* Provide extensibility of storing multiple types of data in column data (e.g. 
indexes, bloom filters, statistics)*
* Offer better write performance by storing metadata at the end of the file

Based on feedback from the Impala beta and after a joint evaluation with 
Twitter, we determined that these further improvements to the Trevni design 
were necessary to provide a more efficient format that we can evolve going 
forward for production usage. Furthermore, we found it appropriate to host and 
develop the columnar file format outside of the Avro project (unlike Trevni, 
which is part of Avro) because Avro is just one of many input data formats that 
can be used with Parquet.

We created Parquet to make the advantages of compressed, efficient columnar 
data representation available to any project in the Hadoop ecosystem, 
regardless of the choice of data processing framework, data model, or 
programming language.

Parquet is built from the ground up with complex nested data structures in 
mind. We adopted the repetition/definition level approach to encoding such data 
structures, as described in Google's Dremel paper; we have found this to be a 
very efficient method of encoding data in non-trivial object schemas.

Parquet is built to support very efficient compression and encoding schemes. 
Parquet allows compression schemes to be specified on a per-column level, and 
is future-proofed to allow adding more encodings as they are invented and 
implemented. We separate the concepts of encoding and compression, allowing 
Parquet consumers to implement operators that work directly on encoded data 
without paying decompression and decoding penalty when possible.

Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data 
processing frameworks, and we are not interested in playing favorites. We 
believe that an efficient, well-implemented columnar storage substrate should 
be useful to all frameworks without the cost of extensive and difficult to set 
up dependencies.

The initial code defines the file format, provides Java building blocks for 
processing columnar data, and implements Hadoop Input/Output Formats, Pig
Storers/Loaders, and an example of a complex integration - Input/Output formats 
that can convert Parquet-stored data directly to and from Thrift objects.

Twitter is starting to convert some of its major data source to Parquet in 
order to take advantage of the compression and deserialization savings.

Parquet is currently under heavy development. Parquet's near-term roadmap 
includes:

   1. Hive SerDes (Criteo)
   2. Cascading Taps (Criteo)
   3. Support for dictionary encoding, zigzag encoding, and RLE encoding of 
data (Cloudera and Twitter)
   4. Further improvements to Pig support (Twitter)

Company names in parenthesis indicate whose engineers signed up to do the work 
- others can feel free to jump in too, of course.

We've also heard requests to provide an Avro container layer, similar to what 
we do with Thrift. Seeking volunteers!

We welcome all feedback, patches, and ideas; to foster community development, 
we plan to contribute Parquet to the Apache Incubator when the development is 
farther along.

Regards,

Nong Li (Cloudera)
Julien Le Dem (Twitter)
Marcel Kornacker (Cloudera)
Todd Lipcon (Cloudera)
Dmitriy Ryaboy (Twitter)
Jonathan Coveney (Twitter)
Justin Coffey (Criteo)
Mickael Lacour (Criteo)
and friends.


Jarcec

Links:
1: http://parquet.github.com
2: 
http://blog.cloudera.com/blog/2013/03/introducing-parquet-columnar-storage-for-apache-hadoop/


signature.asc
Description: Digital signature