Exporting hive table data into oracle give date format error
Hi All, Can you please let me know how can I bypass this error. I am currently using Apache SQOOP version 1.4.2. [hadoop@NHCLT-PC44-2 sqoop-oper]$ sqoop export --connect jdbc:oracle:thin:@10.99.42.11:1521/clouddb --username HDFSUSER --table BTTN_BKP_TEST --export-dir /home/hadoop/user/hive/warehouse/bttn_bkp -P -m 1 --input-fields-terminated-by '\0001' --verbose --input-null-string '\\N' --input-null-non-string '\\N' Please set $HBASE_HOME to the root of your HBase installation. 13/03/13 18:20:42 DEBUG tool.BaseSqoopTool: Enabled debug logging. Enter password: 13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory 13/03/13 18:20:47 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:oracle:thin:@10.99.42.11 13/03/13 18:20:47 DEBUG manager.OracleManager$ConnCache: Instantiated new connection cache. 13/03/13 18:20:47 INFO manager.SqlManager: Using default fetchSize of 1000 13/03/13 18:20:47 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.OracleManager@74b23210 13/03/13 18:20:47 INFO tool.CodeGenTool: Beginning code generation 13/03/13 18:20:47 DEBUG manager.OracleManager: Using column names query: SELECT t.* FROM BTTN_BKP_TEST t WHERE 1=0 13/03/13 18:20:47 DEBUG manager.OracleManager: Creating a new connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb, using username: HDFSUSER 13/03/13 18:20:47 DEBUG manager.OracleManager: No connection paramenters specified. Using regular API for making connection. 13/03/13 18:20:47 INFO manager.OracleManager: Time zone has been set to GMT 13/03/13 18:20:47 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 13/03/13 18:20:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM BTTN_BKP_TEST t WHERE 1=0 13/03/13 18:20:47 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@10.99.42.11:1521/clouddb/HDFSUSER 13/03/13 18:20:47 DEBUG orm.ClassWriter: selected columns: 13/03/13 18:20:47 DEBUG orm.ClassWriter: BTTN_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: DATA_INST_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: SCR_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: BTTN_NU 13/03/13 18:20:47 DEBUG orm.ClassWriter: CAT 13/03/13 18:20:47 DEBUG orm.ClassWriter: WDTH 13/03/13 18:20:47 DEBUG orm.ClassWriter: HGHT 13/03/13 18:20:47 DEBUG orm.ClassWriter: KEY_SCAN 13/03/13 18:20:47 DEBUG orm.ClassWriter: KEY_SHFT 13/03/13 18:20:47 DEBUG orm.ClassWriter: FRGND_CPTN_COLR 13/03/13 18:20:47 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_PRSD 13/03/13 18:20:47 DEBUG orm.ClassWriter: BKGD_CPTN_COLR 13/03/13 18:20:47 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_PRSD 13/03/13 18:20:47 DEBUG orm.ClassWriter: BLM_FL 13/03/13 18:20:47 DEBUG orm.ClassWriter: LCLZ_FL 13/03/13 18:20:47 DEBUG orm.ClassWriter: MENU_ITEM_NU 13/03/13 18:20:47 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: ON_ATVT 13/03/13 18:20:47 DEBUG orm.ClassWriter: ON_CLIK 13/03/13 18:20:47 DEBUG orm.ClassWriter: ENBL_FL 13/03/13 18:20:47 DEBUG orm.ClassWriter: BLM_SET_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: BTTN_ASGN_LVL_NAME 13/03/13 18:20:47 DEBUG orm.ClassWriter: MKT_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: CRTE_TS 13/03/13 18:20:47 DEBUG orm.ClassWriter: CRTE_USER_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: UPDT_TS 13/03/13 18:20:47 DEBUG orm.ClassWriter: UPDT_USER_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: DEL_TS 13/03/13 18:20:47 DEBUG orm.ClassWriter: DEL_USER_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: DLTD_FL 13/03/13 18:20:47 DEBUG orm.ClassWriter: MENU_ITEM_NA 13/03/13 18:20:47 DEBUG orm.ClassWriter: PRD_CD 13/03/13 18:20:47 DEBUG orm.ClassWriter: BLM_SET_NA 13/03/13 18:20:47 DEBUG orm.ClassWriter: SOUND_FILE_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: IS_DYNMC_BTTN 13/03/13 18:20:47 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: FRGND_CPTN_COLR_PRSD_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: BKGD_CPTN_COLR_PRSD_ID 13/03/13 18:20:47 DEBUG orm.ClassWriter: Writing source file: /tmp/sqoop-hadoop/compile/69b6a9d2ebb99cebced808e559528531/BTTN_BKP_TEST.java 13/03/13 18:20:47 DEBUG orm.ClassWriter: Table name: BTTN_BKP_TEST 13/03/13 18:20:47 DEBUG orm.ClassWriter: Columns: BTTN_ID:2, DATA_INST_ID:2, SCR_ID:2, BTTN_NU:2, CAT:2, WDTH:2, HGHT:2, KEY_SCAN:2, KEY_SHFT:2, FRGND_CPTN_COLR:12, FRGND_CPTN_COLR_PRSD:12, BKGD_CPTN_COLR:12, BKGD_CPTN_COLR_PRSD:12, BLM_FL:2, LCLZ_FL:2, MENU_ITEM_NU:2, BTTN_ASGN_LVL_ID:2, ON_ATVT:2, ON_CLIK:2, ENBL_FL:2, BLM_SET_ID:2, BTTN_ASGN_LVL_NAME:12, MKT_ID:2, CRTE_TS:93, CRTE_USER_ID:12, UPDT_TS:93, UPDT_USER_ID:12, DEL_TS:93, DEL_USER_ID:12, DLTD_FL:2, MENU_ITEM_NA:12, PRD_CD:2, BLM_SET_NA:12,
Add external jars automatically
Hi all, I'm using the hive json serde and need to run: ADD JAR /usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that require it. Is it possible to have this jar available automatically? I could do it via adding the statement to a .hiverc file, but I was wondering if there is some better way... Cheers, Krishna
Re: Add external jars automatically
If you look into ${HIVE_HOME}/bin/hive script there are multiple ways to add the jar. One of my favorite, besides the .hiverc file, has been to put the jar into ${HIVE_HOME}/auxlib dir. There always is the HIVE_AUX_JARS_PATH environment variable (but this introduces a dependency on the environment). On Wed, Mar 13, 2013 at 10:26 AM, Krishna Rao krishnanj...@gmail.comwrote: Hi all, I'm using the hive json serde and need to run: ADD JAR /usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that require it. Is it possible to have this jar available automatically? I could do it via adding the statement to a .hiverc file, but I was wondering if there is some better way... Cheers, Krishna
Re: Add external jars automatically
Ah great, the auxlib dir option sounds perfect. Cheers On 13 March 2013 17:41, Alex Kozlov ale...@cloudera.com wrote: If you look into ${HIVE_HOME}/bin/hive script there are multiple ways to add the jar. One of my favorite, besides the .hiverc file, has been to put the jar into ${HIVE_HOME}/auxlib dir. There always is the HIVE_AUX_JARS_PATH environment variable (but this introduces a dependency on the environment). On Wed, Mar 13, 2013 at 10:26 AM, Krishna Rao krishnanj...@gmail.comwrote: Hi all, I'm using the hive json serde and need to run: ADD JAR /usr/lib/hive/lib/hive-json-serde-0.2.jar;, before I can use tables that require it. Is it possible to have this jar available automatically? I could do it via adding the statement to a .hiverc file, but I was wondering if there is some better way... Cheers, Krishna
Introducing Parquet: efficient columnar storage for Hadoop
Fellow Hive users, We'd like to introduce a joint project between Twitter and Cloudera engineers -- a new columnar storage format for Hadoop called Parquet [1]. Official announcement is available on Cloudera blog [2]. Parquet is designed to bring efficient columnar storage to Hadoop. Compared to, and learning from, the initial work done toward this goal in Trevni, Parquet includes the following enhancements: * Efficiently encode nested structures and sparsely populated data based on the Google Dremel definition/repetition levels * Provide extensible support for per-column encodings (e.g. delta, run length, etc) * Provide extensibility of storing multiple types of data in column data (e.g. indexes, bloom filters, statistics)* * Offer better write performance by storing metadata at the end of the file Based on feedback from the Impala beta and after a joint evaluation with Twitter, we determined that these further improvements to the Trevni design were necessary to provide a more efficient format that we can evolve going forward for production usage. Furthermore, we found it appropriate to host and develop the columnar file format outside of the Avro project (unlike Trevni, which is part of Avro) because Avro is just one of many input data formats that can be used with Parquet. We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model, or programming language. Parquet is built from the ground up with complex nested data structures in mind. We adopted the repetition/definition level approach to encoding such data structures, as described in Google's Dremel paper; we have found this to be a very efficient method of encoding data in non-trivial object schemas. Parquet is built to support very efficient compression and encoding schemes. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. We separate the concepts of encoding and compression, allowing Parquet consumers to implement operators that work directly on encoded data without paying decompression and decoding penalty when possible. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies. The initial code defines the file format, provides Java building blocks for processing columnar data, and implements Hadoop Input/Output Formats, Pig Storers/Loaders, and an example of a complex integration - Input/Output formats that can convert Parquet-stored data directly to and from Thrift objects. Twitter is starting to convert some of its major data source to Parquet in order to take advantage of the compression and deserialization savings. Parquet is currently under heavy development. Parquet's near-term roadmap includes: 1. Hive SerDes (Criteo) 2. Cascading Taps (Criteo) 3. Support for dictionary encoding, zigzag encoding, and RLE encoding of data (Cloudera and Twitter) 4. Further improvements to Pig support (Twitter) Company names in parenthesis indicate whose engineers signed up to do the work - others can feel free to jump in too, of course. We've also heard requests to provide an Avro container layer, similar to what we do with Thrift. Seeking volunteers! We welcome all feedback, patches, and ideas; to foster community development, we plan to contribute Parquet to the Apache Incubator when the development is farther along. Regards, Nong Li (Cloudera) Julien Le Dem (Twitter) Marcel Kornacker (Cloudera) Todd Lipcon (Cloudera) Dmitriy Ryaboy (Twitter) Jonathan Coveney (Twitter) Justin Coffey (Criteo) Mickael Lacour (Criteo) and friends. Jarcec Links: 1: http://parquet.github.com 2: http://blog.cloudera.com/blog/2013/03/introducing-parquet-columnar-storage-for-apache-hadoop/ signature.asc Description: Digital signature