Andries Engelbrecht created DRILL-2456:
------------------------------------------

             Summary: regexp_replace using hex codes fails on larger JSON data 
sets
                 Key: DRILL-2456
                 URL: https://issues.apache.org/jira/browse/DRILL-2456
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 0.7.0
         Environment: Drill 0.7
MapR 4.0.1
CentOS
            Reporter: Andries Engelbrecht
            Assignee: Daniel Barclay (Drill)
         Attachments: drillbit.log

This query works with only 1 file

select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by `text` 
order by count(id) desc limit 10;

This one fails with multiple files

select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

Query failed: Query failed: Failure while trying to start remote fragment, 
Encountered an illegal char on line 1, column 31: '' [ 
43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ]

Using text in regexp_replace does work for same dataset.
This query works fine on full data set.

select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id)  from 
dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

Attached snippet drillbit.log for error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to