[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-09-05 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-528267529
 
 
   Have solve this : 
   the reason is Map memory is too small
mapreduce.map.memory.mb 20096 
  mapreduce.reduce.memory.mb 
9096 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-09-04 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-528175759
 
 
   i make a test for this PR:
   
   **source** data 300 million ;
   **dictionary** build time 45 min ; dictionary size 7GB; 
   dictionary info:Total 191707672 values and 48 slices,min:0,max:191707672
   
   **there is one problem:
   when build segment long time can't build finish;**
   there is the Map log:
   `2019-09-05T09:32:00,414 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader - at row 0. reading next 
block
   2019-09-05T09:32:00,975 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 
561 ms. row count = 732366
   2019-09-05T09:39:10,079 INFO [main] 
org.apache.kylin.dict.AppendTrieDictionary - Evict slice with key 
APArDCycNA1LBbSBJN114twCcdStEcZQtVU= and value null caused by COLLECTED, size 
30/48
   2019-09-05T09:39:10,079 INFO [main] 
org.apache.kylin.dict.AppendTrieDictionary - Evict slice with key 
ALDreChzKg1LBUt0Rr_TW-HAbTiuSotei-I= and value null caused by COLLECTED, size 
30/48
   2019-09-05T09:39:10,080 INFO [main] 
org.apache.kylin.dict.AppendTrieDictionary - Evict slice with key 
ACDvscHggA1LBdZmGFsn8cWVhiIyEjszhu0= and value null caused by COLLECTED, size 
30/48
   2019-09-05T09:39:10,080 INFO [main] 
org.apache.kylin.dict.AppendTrieDictionary - Evict slice with key  and value 
null caused by COLLECTED, size 30/48
   2019-09-05T09:39:10,080 INFO [main] 
org.apache.kylin.dict.AppendTrieDictionary - Evict slice with key 
AMBh5j9swQ9LBfOZxCJ6vLZh1ms8RIDZmW4= and value null caused by COLLECTED, size 
30/48`
   
   the build segment MR memory config:
   `
 mapreduce.map.memory.mb
 10096
 
 
mapreduce.reduce.memory.mb
9096 
   `
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-07-17 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-512133542
 
 
   SQL have plan will support uniq query?
   Like theta use APPROX_COUNT_DISTINCT_DS_THETA with sql


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-07-16 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-512086460
 
 
   > > From the error message ,i think there is must be an error in built 
dictionary stage ! so when in the ingestion stage there is a key can't find in 
the dictionary!
   > > do you have the same error when use wikiticker-2015-09-12-sampled.json 
data? @pzhdfy
   > 
   > Yes, this is a know issue, non-ascii char may not be encoded correctly in 
BuildDictJob.
   > We fix it in our internal version, I will commit it later
   
   Thank you for your attention


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-07-16 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-511800536
 
 
   from the error message ,i think there is must be an error in built  
dictionary stage ! so when in the ingestion stage there is a key can't find in 
the dictionary


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] JackyYangPassion commented on issue #7594: [for master] general exactly count distinct support in Druid

2019-07-16 Thread GitBox
JackyYangPassion commented on issue #7594: [for master] general exactly count 
distinct support in Druid
URL: https://github.com/apache/incubator-druid/pull/7594#issuecomment-511795351
 
 
   when i use quickstart example wikiticker-2015-09-12-sampled.json with unique 
user:
   record:
   
`{"time":"2015-09-12T00:47:44.963Z","channel":"#ru.wikipedia","cityName":null,"comment":"/*
 Донецкая Народная Республика 
*/","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Караман,
 Александр Акимович","regionIsoCode":null,"regionName":null,"user":"Камарад 
Че","delta":0,"added":0,"deleted":0}`
   
   Error:
   `java.lang.Exception: io.druid.java.util.common.RE: Failure on 
row[{"time":"2015-09-12T00:47:44.963Z","channel":"#ru.wikipedia","cityName":null,"comment":"/*
 Донецкая Народная Республика 
*/","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Караман,
 Александр 
Акимович","regionIsoCode":null,"regionName":null,"user":"Камарад 
Че","delta":0,"added":0,"deleted":0}]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
[hadoop-mapreduce-client-common-2.7.3.jar:?]
   Caused by: io.druid.java.util.common.RE: Failure on 
row[{"time":"2015-09-12T00:47:44.963Z","channel":"#ru.wikipedia","cityName":null,"comment":"/*
 Донецкая Народная Республика 
*/","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Караман,
 Александр 
Акимович","regionIsoCode":null,"regionName":null,"user":"Камарад 
Че","delta":0,"added":0,"deleted":0}]
at 
io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:93) 
~[druid-indexing-hadoop-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) 
~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) 
~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_171]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_171]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_171]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_171]
   Caused by: java.lang.IllegalArgumentException: Value : Камарад Че 
not exists
at 
org.apache.kylin.common.util.Dictionary.getIdFromValue(Dictionary.java:107) 
~[kylin-core-common-2.5.2.jar:2.5.2]
at 
org.apache.kylin.common.util.Dictionary.getIdFromValue(Dictionary.java:85) 
~[kylin-core-common-2.5.2.jar:2.5.2]
at 
io.druid.segment.incremental.IncrementalIndex$1InputRowDictWrap.getDimension(IncrementalIndex.java:180)
 ~[druid-processing-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.query.aggregation.unique.RoaringBitmapComplexMetricSerde$1.extractValue(RoaringBitmapComplexMetricSerde.java:70)
 ~[druid-uniq-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.query.aggregation.unique.RoaringBitmapComplexMetricSerde$1.extractValue(RoaringBitmapComplexMetricSerde.java:52)
 ~[druid-uniq-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.segment.incremental.IncrementalIndex$1IncrementalIndexInputRowColumnSelectorFactory$1.getObject(IncrementalIndex.java:264)
 ~[druid-processing-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.query.aggregation.unique.UniqueBuildAggregator.aggregate(UniqueBuildAggregator.java:53)
 ~[druid-uniq-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at io.druid.indexer.InputRowSerde.toBytes(InputRowSerde.java:281) 
~[druid-indexing-hadoop-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.indexer.IndexGeneratorJob$IndexGeneratorMapper.innerMap(IndexGeneratorJob.java:371)
 ~[druid-indexing-hadoop-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at 
io.druid.indexer.HadoopDruidIndexerMapper.map(HadoopDruidIndexerMapper.java:88) 
~[druid-indexing-hadoop-0.12.4-SNAPSHOT.jar:0.12.4-SNAPSHOT]
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)