[
https://issues.apache.org/jira/browse/TAJO-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970483#comment-13970483
]
Hyunsik Choi edited comment on TAJO-711 at 4/16/14 6:21 AM:
------------------------------------------------------------
Excellent! Big +1 for the latest patch. I tested the latest patch in a local
cluster. It works perfectly. Thank you for your awesome contribution! I'll
commit it if there are no additional comment until today's night.
There is one very trivial suggestion. An instance of FileScanner including
AvroScanner is created, and then can be closed without invoking
{{FileScanner::init()}} method. I'm sorry for not mentioning it in javadoc.
Anyway, {{FileScanner::close()}} should check the nullity of member variables.
As I mentioned, I tested the patch on a local cluster. First of all, I prepared
the avro schema as follows:
{code}
{
"type": "record",
"namespace": "org.apache.tajo",
"name": "table1",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "name", "type": "string" }
]
}
{code}
Then, I created one database and one table as follows:
{code}
default> create database avro2;
Ok
default> \c avro2
avro> create table avro2 (id int, name text) using avro with ('avro.schema.url'
= 'file:///home/hyunsik/schema.avsc');
Ok
avro> \d avro2
table name: avro.avro2
table path: hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
store type: AVRO
number of rows: 0
volume: 0 B
Options:
'avro.schema.url'='file:///home/hyunsik/schema.avsc'
schema:
id INT4
name TEXT
{code}
Next, I inserted rows 6,001,215 rows to the avro table via {{INSERT OVERWRITE
INTO}} statement as follows:
{code}
avro> insert overwrite into avro2 (id, name) select l_orderkey::int4,
l_returnflag from tpch.lineitem;
Progress: 8%, response time: 0.397 sec
Progress: 17%, response time: 1.2 sec
Progress: 69%, response time: 2.202 sec
Progress: 100%, response time: 2.909 sec
final state: QUERY_SUCCEEDED, response time: 2.909 sec
OK
{code}
I checked the generated files.
{noformat}
[hyunsik@local05 hadoop-2.3.0]$ bin/hadoop dfs -ls
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
/home/hyunsik/Code/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>',
or link it with '-z noexecstack'.
14/04/16 14:43:14 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 23 items
-rw-r--r-- 3 hyunsik supergroup 1331444 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000000
-rw-r--r-- 3 hyunsik supergroup 1335487 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000001
-rw-r--r-- 3 hyunsik supergroup 1335522 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000002
-rw-r--r-- 3 hyunsik supergroup 1351444 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000003
-rw-r--r-- 3 hyunsik supergroup 1590096 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000004
-rw-r--r-- 3 hyunsik supergroup 1590222 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000005
-rw-r--r-- 3 hyunsik supergroup 1589538 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000006
-rw-r--r-- 3 hyunsik supergroup 1590408 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000007
-rw-r--r-- 3 hyunsik supergroup 1590168 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000008
-rw-r--r-- 3 hyunsik supergroup 1589226 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000009
-rw-r--r-- 3 hyunsik supergroup 1589688 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000010
-rw-r--r-- 3 hyunsik supergroup 1589790 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000011
-rw-r--r-- 3 hyunsik supergroup 1590048 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000012
-rw-r--r-- 3 hyunsik supergroup 1590204 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000013
-rw-r--r-- 3 hyunsik supergroup 1590234 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000014
-rw-r--r-- 3 hyunsik supergroup 1589562 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000015
-rw-r--r-- 3 hyunsik supergroup 1590276 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000016
-rw-r--r-- 3 hyunsik supergroup 1590720 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000017
-rw-r--r-- 3 hyunsik supergroup 1590198 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000018
-rw-r--r-- 3 hyunsik supergroup 1589508 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000019
-rw-r--r-- 3 hyunsik supergroup 1590042 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000020
-rw-r--r-- 3 hyunsik supergroup 1589814 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000021
-rw-r--r-- 3 hyunsik supergroup 1026861 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000022
{noformat}
Then, I tried to execute some simple and distributed queries:
{noformat}
avro> select id from avro2 limit 10;
Progress: 100%, response time: 0.351 sec
final state: QUERY_SUCCEEDED, response time: 0.351 sec
result: 10 rows (80 B)
id
-------------------------------
1860579
1860579
1860579
1860580
1860580
1860580
1860580
1860580
1860580
1860581
avro> select id, name from avro2 order by id asc limit 10;
Progress: 8%, response time: 0.399 sec
Progress: 41%, response time: 1.202 sec
Progress: 100%, response time: 1.574 sec
final state: QUERY_SUCCEEDED, response time: 1.574 sec
result: 10 rows (40 B)
id, name
-------------------------------
1, N
1, N
1, N
1, N
1, N
1, N
2, N
3, R
3, R
3, A
avro> select id, name from avro2 order by id desc limit 10;
Progress: 6%, response time: 0.401 sec
Progress: 45%, response time: 1.203 sec
Progress: 100%, response time: 1.551 sec
final state: QUERY_SUCCEEDED, response time: 1.551 sec
result: 10 rows (100 B)
id, name
-------------------------------
6000000, N
6000000, N
5999975, R
5999975, A
5999975, A
5999974, R
5999974, R
5999973, N
5999972, N
5999972, N
avro> select count(id), count(name) from avro2;
Progress: 19%, response time: 0.401 sec
Progress: 100%, response time: 0.776 sec
final state: QUERY_SUCCEEDED, response time: 0.776 sec
result: 1 rows (16 B)
?count, ?count_1
-------------------------------
6001215, 6001215
{noformat}
was (Author: hyunsik):
Excellent! Big +1 for the latest patch. I tested the latest patch in real
cluster. It works perfectly. Thank you for your awesome contribution! I'll
commit it if there are no additional comment until today's night.
There is one very trivial suggestion. An instance of FileScanner including
AvroScanner is created, and then can be closed without invoking
{{FileScanner::init()}} method. I'm sorry for not mentioning it in javadoc.
Anyway, {{FileScanner::close()}} should check the nullity of member variables.
I verified the patch on a local cluster. First of all, I prepared the avro
schema as follows:
{code}
{
"type": "record",
"namespace": "org.apache.tajo",
"name": "table1",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "name", "type": "string" }
]
}
{code}
Then, I created one database and one table as follows:
{code}
default> create database avro2;
Ok
default> \c avro2
avro> create table avro2 (id int, name text) using avro with ('avro.schema.url'
= 'file:///home/hyunsik/schema.avsc');
Ok
avro> \d avro2
table name: avro.avro2
table path: hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
store type: AVRO
number of rows: 0
volume: 0 B
Options:
'avro.schema.url'='file:///home/hyunsik/schema.avsc'
schema:
id INT4
name TEXT
{code}
Next, I inserted rows 6,001,215 rows to the avro table via {{INSERT OVERWRITE
INTO}} statement as follows:
{code}
avro> insert overwrite into avro2 (id, name) select l_orderkey::int4,
l_returnflag from tpch.lineitem;
Progress: 8%, response time: 0.397 sec
Progress: 17%, response time: 1.2 sec
Progress: 69%, response time: 2.202 sec
Progress: 100%, response time: 2.909 sec
final state: QUERY_SUCCEEDED, response time: 2.909 sec
OK
{code}
I checked the generated files.
{noformat}
[hyunsik@local05 hadoop-2.3.0]$ bin/hadoop dfs -ls
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
/home/hyunsik/Code/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>',
or link it with '-z noexecstack'.
14/04/16 14:43:14 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 23 items
-rw-r--r-- 3 hyunsik supergroup 1331444 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000000
-rw-r--r-- 3 hyunsik supergroup 1335487 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000001
-rw-r--r-- 3 hyunsik supergroup 1335522 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000002
-rw-r--r-- 3 hyunsik supergroup 1351444 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000003
-rw-r--r-- 3 hyunsik supergroup 1590096 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000004
-rw-r--r-- 3 hyunsik supergroup 1590222 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000005
-rw-r--r-- 3 hyunsik supergroup 1589538 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000006
-rw-r--r-- 3 hyunsik supergroup 1590408 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000007
-rw-r--r-- 3 hyunsik supergroup 1590168 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000008
-rw-r--r-- 3 hyunsik supergroup 1589226 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000009
-rw-r--r-- 3 hyunsik supergroup 1589688 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000010
-rw-r--r-- 3 hyunsik supergroup 1589790 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000011
-rw-r--r-- 3 hyunsik supergroup 1590048 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000012
-rw-r--r-- 3 hyunsik supergroup 1590204 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000013
-rw-r--r-- 3 hyunsik supergroup 1590234 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000014
-rw-r--r-- 3 hyunsik supergroup 1589562 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000015
-rw-r--r-- 3 hyunsik supergroup 1590276 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000016
-rw-r--r-- 3 hyunsik supergroup 1590720 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000017
-rw-r--r-- 3 hyunsik supergroup 1590198 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000018
-rw-r--r-- 3 hyunsik supergroup 1589508 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000019
-rw-r--r-- 3 hyunsik supergroup 1590042 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000020
-rw-r--r-- 3 hyunsik supergroup 1589814 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000021
-rw-r--r-- 3 hyunsik supergroup 1026861 2014-04-16 14:40
hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000022
{noformat}
Then, I tried to execute some simple and distributed queries:
{noformat}
avro> select id from avro2 limit 10;
Progress: 100%, response time: 0.351 sec
final state: QUERY_SUCCEEDED, response time: 0.351 sec
result: 10 rows (80 B)
id
-------------------------------
1860579
1860579
1860579
1860580
1860580
1860580
1860580
1860580
1860580
1860581
avro> select id, name from avro2 order by id asc limit 10;
Progress: 8%, response time: 0.399 sec
Progress: 41%, response time: 1.202 sec
Progress: 100%, response time: 1.574 sec
final state: QUERY_SUCCEEDED, response time: 1.574 sec
result: 10 rows (40 B)
id, name
-------------------------------
1, N
1, N
1, N
1, N
1, N
1, N
2, N
3, R
3, R
3, A
avro> select id, name from avro2 order by id desc limit 10;
Progress: 6%, response time: 0.401 sec
Progress: 45%, response time: 1.203 sec
Progress: 100%, response time: 1.551 sec
final state: QUERY_SUCCEEDED, response time: 1.551 sec
result: 10 rows (100 B)
id, name
-------------------------------
6000000, N
6000000, N
5999975, R
5999975, A
5999975, A
5999974, R
5999974, R
5999973, N
5999972, N
5999972, N
avro> select count(id), count(name) from avro2;
Progress: 19%, response time: 0.401 sec
Progress: 100%, response time: 0.776 sec
final state: QUERY_SUCCEEDED, response time: 0.776 sec
result: 1 rows (16 B)
?count, ?count_1
-------------------------------
6001215, 6001215
{noformat}
> Add Avro storage support
> ------------------------
>
> Key: TAJO-711
> URL: https://issues.apache.org/jira/browse/TAJO-711
> Project: Tajo
> Issue Type: New Feature
> Reporter: David Chen
> Assignee: David Chen
> Attachments: TAJO-711.patch, TAJO-711.patch,
> TAJO-711_140415_rebased.patch, TAJO-711_20140413_20:36:40.patch,
> TAJO-711_20140413_21:00:34.patch, TAJO-711_20140413_21:46:27.patch,
> TAJO-711_20140414_11:07:13.patch, TAJO-711_20140415_11:13:43.patch
>
>
> Add {{FileScanner}} and {{FileAppender}} for reading from and writing to Avro.
--
This message was sent by Atlassian JIRA
(v6.2#6252)