[ https://issues.apache.org/jira/browse/SPARK-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust resolved SPARK-1661. ------------------------------------- Resolution: Won't Fix > the result of querying table created with RegexSerDe is all null > ---------------------------------------------------------------- > > Key: SPARK-1661 > URL: https://issues.apache.org/jira/browse/SPARK-1661 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 0.9.0 > Environment: linux 2.6.32-358.el6.x86_64,Hive 12.0,shark 0.9.0,Hadoop > 2.2.0 > Reporter: likunjian > Labels: HQL, hadoop, hive, regex, shark > Attachments: log.txt > > Original Estimate: 168h > Remaining Estimate: 168h > > the result of querying table created with RegexSerDe is all null > when i query the table created with > org.apache.hadoop.hive.contrib.serde2.RegexSerDe by shark,the columns in the > result is all null > select * from access_log where logdate='2014-04-28' limit 10; > OK > ip host time method request protocol status size > referer cookieuid requesttime session httpxrequestedwith agent > upstreamresponsetime logdate > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL > NULL NULL NULL NULL NULL 2014-04-28 > Time taken: 4.362 seconds > my regex is > ^([^ ]*) [^ ]* ([^ ]*) \\[([^\]]*)\\] \"([^ ]*) ([^ ]*) ([^ ]*)\" (-|[0-9]*) > (-|[0-9]*) \"(\.\+\?|-)\" ([^ ]*) ([^ ]*) ([^ ]*) \"(\.\+\?|-)\" > \"(\.\+\?|-)\" \"(\.\+\?|-)\"$ > nginx log example: > 42.49.44.61 - www.xxxx.comm [20/Apr/2014:23:58:03 +0800] "GET /xxxxx/296837 > HTTP/1.1" 200 3871 "http://www.xxxxx.com/xxxxx/296837" - 0.015 > 63hbb4om2cvtjs0f7d969n1uf4 "com.xxxxx.browser" "Mozilla/5.0 (Linux; U; xxxxx > 4.1.2; zh-cn; ZTE N919 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) > Version/4.0 Mobile Safari/534.30" "0.015" > 111.121.176.149 - www.xxxx.comm [20/Apr/2014:23:58:03 +0800] "GET > /xxxxx/264904 HTTP/1.1" 200 3827 > "http://m.baidu.com/s?from=2001a&bd_page_type=1&word=%E8%8E%B2%E8%97%95%E6%80%8E%E6%A0%B7%E5%8D%A4%E6%89%8D%E5%A5%BD%E5%90%83" > - 0.015 ft7tr4b06b23ub9lnugdf4gcq3 "-" "Mozilla/5.0 (Linux; U; xxxxx 4.1.2; > zh-CN; 8190Q Build/JZO54K) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 > UCBrowser/9.5.2.394 U3/0.8.0 Mobile Safari/533.1" "0.015" > 222.209.97.169 - www.xxxx.comm [20/Apr/2014:23:58:04 +0800] "GET / HTTP/1.1" > 200 3188 "http://m.idea123.cn/food.html" - 0.014 - "-" "Lenovo S890/S100 > Linux/3.0.13 xxxxx/4.0.3 Release/12.12.2011 Browser/AppleWebKit534.30 > Profile/MIDP-2.0 Configuration/CLDC-1.1 Mobile Safari/534.30" "0.014" > 59.36.84.241 - www.xxxx.comm [20/Apr/2014:23:58:05 +0800] "GET > /app/xxxxx/topic/view.php?id=138555 HTTP/1.1" 200 3151 "-" - 0.009 - "-" > "Mozilla/5.0 (Linux; U; xxxxx 2.3.7; zh-cn; TD500 Build/GWK74) > AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30" > "0.009" > 113.242.39.81 - www.xxxx.comm [20/Apr/2014:23:58:07 +0800] "GET /xxxxx/419691 > HTTP/1.1" 200 4174 "http://www.xxxx.comm/xxxxx/all/308?p=3" - 0.013 > 1n579ukg1gho7i7mr3q8ic8j97 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X > 10_5_7; en-us) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 > Safari/530.17; 360browser(securitypay,securityinstalled); > 360(xxxxx,uppayplugin); 360 Aphone Browser (5.3.1)" "0.013" > Very strange, I execute a query in Hive is normal. I really do not > understand. . . :-( > OK > ip host time method request protocol status size > referer cookieuid requesttime session httpxrequestedwith agent > upstreamresponsetime logdate > 14.151.40.117 www.xxxx.com 10/Apr/2014:23:58:01 +0800 POST > /xxxx.jsp?appid=4&appkey=573bbd2fbd1a6bac082ff4727d952ba3&format=json&sessionid=1397145480&vc=24&vn=v3.5.1&loguid=&deviceid=0f607264fc6318a92b9e13c65db7cd3c%7C02C105A6-6DC8-43D5-879E-46AD603AC34E%7C2096145A-114C-4B6E-BE91-1AC740D9BD21&channel=appstore&method=Update.forceUpdate > HTTP/1.1 200 88 - 0.052 - - xxxxx xxxxx > iPhone Client API 0.005 2014-04-11 > 112.91.89.149 www.xxxx.com 10/Apr/2014:23:58:02 +0800 POST > /xxxx.jsp?appid=2&appkey=9ef269eec4f7a9d07c73952d06b5413f&format=json&sessionid=1397141523139&vc=53&vn=3.5.2&loguid=2888069&deviceid=xxxxx354244055617944&uuid=c7d71164-bae7-4d51-8032-ec9cfafb5e7e&channel=default&method=Info.getinfoV3&virtual=O1wq33EpQ%2Fa%2B8NxjEsy57ZezKBefR85F4L%2BXPZlcSETtw5Fl%2FdQFuLuNUg4Co9zHJyw2jPJipOR3%0Acrc59PUeFvM5hU82AdDQGjIkXa%2FLMWtxuJYz6fJBHLxkQRPWUVCdpeENwrnYlvgAY6DqM9G%2Fh5g1%0AbZamnAgUMERY1iZzPLk%3D%0A > HTTP/1.1 200 2008 xxxxx xxxxx xxxxx POST - 0.033 > - - Mozilla/5.0 (xxxxx) xxxxx/20100101 xxxxx/1.0.0 0.033 > 2014-04-11 > 113.133.68.221 www.xxxx.com 10/Apr/2014:23:58:02 +0800 POST > /xxxx.jsp?appid=2&appkey=9ef269eec4f7a9d07c73952d06b5413f&format=json&sessionid=1397145345830&vc=53&vn=3.5.2&loguid=4377880&deviceid=xxxxx355369055653422&uuid=546e07f7-6439-48d4-8d6d-e5dc0001e569&channel=91_v352&method=Ad.getAd_iMocha&virtual= > HTTP/1.1 200 88 xxxxx xxxxx xxxxx POST - 0.127 > - - Mozilla/5.0 (xxxxx) xxxxx/20100101 xxxxx/1.0.0 0.127 > 2014-04-11 > this is my HQL: > CREATE external TABLE access_log ( > ip STRING, > host STRING, > time STRING, > method STRING, > request STRING, > protocol STRING, > status STRING, > size STRING, > referer STRING, > cookieuid STRING, > requestTime STRING, > session STRING, > httpXRequestedWith STRING, > agent STRING, > upstreamResponseTime STRING > ) > partitioned by (logdate string) > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' > WITH SERDEPROPERTIES ( > "input.regex" = "^([^ ]*) - ([^ ]*) \\[([^\]]*)\\] \"([^ ]*) ([^ ]*) ([^ > ]*)\" (-|[0-9]*) (-|[0-9]*) \"(\.\+\?|-)\" ([^ ]*) ([^ ]*) ([^ ]*) > \"(\.\+\?|-)\" \"(\.\+\?|-)\" \"(\.\+\?|-)\"$", > "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s > %11$s %12$s %13$s %14$s %15$s" > ) > STORED AS TEXTFILE; -- This message was sent by Atlassian JIRA (v6.2#6252)