Hi,shaofeng, Sorry for my slow response. There’s indeed some serialization issue in spark context. Here’s some my opinions: * Some field is initialized in constructor, which meaning NOT NEED to be serialized, we can qualified these field with ‘transient’; * Do we really need to serialized CachedTreeMap to spark executor tasks? Maybe every tasks initialize own CacheTreeMap instance is another choice;
Please feel free to change the code if you really need to serialize CachedTreeMap, and let me know if there’s somewhere I could help. > 在 2017年1月20日,14:04,ShaoFeng Shi <[email protected]> 写道: > > Hi Yerui, > > I noticed that the CachedTreeMap.java uses a couple of classes from > org.apache.hadoop.fs package; and you have a comment "TODO Depends on HDFS > for now, ideally just depends on storage interface" > > Now this impact the cube building with Spark, as some classes like > org.apache.hadoop.fs.Path isn't serializable while Spark relies on Java > serialization heavily. Will get error when building a cube with bitmap > measure as in below. So, can it be changed to ordinary classes like String > here? Thanks! > > Caused by: java.io.NotSerializableException: org.apache.hadoop.fs.Path > Serialization stack: > - object not serializable (class: org.apache.hadoop.fs.Path, value: > hdfs:/kylin/kylin_default_instance/resources/GlobalDict/dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP) > - writeObject data (class: java.util.TreeMap) > - object (class org.apache.kylin.dict.CachedTreeMap, {=null}) > - field (class: org.apache.kylin.dict.AppendTrieDictionary, name: > dictSliceMap, type: class java.util.TreeMap) > - object (class org.apache.kylin.dict.AppendTrieDictionary, > AppendTrieDictionary(hdfs:///kylin/kylin_default_instance/resources/GlobalDict/dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP/)) > - writeObject data (class: java.util.HashMap) > - object (class java.util.HashMap, > {DEFAULT.TEST_KYLIN_FACT.LSTG_SITE_ID=org.apache.kylin.dict.TrieDictionaryForest@f30773fa, > > DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL2_NAME=org.apache.kylin.dict.TrieDictionaryForest@18259639, > > DEFAULT.TEST_CATEGORY_GROUPINGS.META_CATEG_NAME=org.apache.kylin.dict.TrieDictionaryForest@44184626, > > BUYER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict.TrieDictionaryForest@879f6439, > > SELLER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_SELLER_LEVEL=org.apache.kylin.dict.TrieDictionaryForest@879f6439, > > BUYER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict.TrieDictionaryForest@879f6439, > > SELLER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_BUYER_LEVEL=org.apache.kylin.dict.TrieDictionaryForest@879f6439, > > DEFAULT.TEST_KYLIN_FACT.TRANS_ID=org.apache.kylin.dict.TrieDictionaryForest@93b5aa11, > > DEFAULT.TEST_CATEGORY_GROUPINGS.CATEG_LVL3_NAME=org.apache.kylin.dict.TrieDictionaryForest@a494947b, > SELLER_COUNTRY:DEFAULT.TEST_COUNTRY.NAME > <http://default.test_country.name/>=org.apache.kylin.dict.TrieDictionaryForest@b3559b4c, > BUYER_COUNTRY:DEFAULT.TEST_COUNTRY.NAME > <http://default.test_country.name/>=org.apache.kylin.dict.TrieDictionaryForest@b3559b4c, > > SELLER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict.TrieDictionaryForest@410216c0, > > BUYER_ACCOUNT:DEFAULT.TEST_ACCOUNT.ACCOUNT_COUNTRY=org.apache.kylin.dict.TrieDictionaryForest@410216c0, > > DEFAULT.TEST_KYLIN_FACT.PRICE=org.apache.kylin.dict.TrieDictionaryForest@89f144c6, > > DEFAULT.TEST_KYLIN_FACT.TEST_COUNT_DISTINCT_BITMAP=AppendTrieDictionary(hdfs:///kylin/kylin_default_instance/resources/GlobalDict/dict/DEFAULT.TEST_KYLIN_FACT/TEST_COUNT_DISTINCT_BITMAP/), > > DEFAULT.TEST_KYLIN_FACT.LEAF_CATEG_ID=org.apache.kylin.dict.TrieDictionaryForest@25e701d0, > > DEFAULT.TEST_KYLIN_FACT.SLR_SEGMENT_CD=org.apache.kylin.dict.TrieDictionaryForest@dcfc7d11, > DEFAULT.TEST_KYLIN_FACT.CAL_DT=DateStrDictionary [pattern=yyyy-MM-dd, > baseId=0]}) > > > > -- > Best regards, > > Shaofeng Shi 史少锋 >
