[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyang resolved KYLIN-1835. --------------------------- Resolution: Not A Bug > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > ---------------------------------------------------------------------------------------------------------------------- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug > Affects Versions: v1.5.2, v1.5.2.1 > Reporter: Richard Calaba > Priority: Minor > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > ============= > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > ======== > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field ???? > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)