Hi Ananthanarayanan.M.S,
Actually, we add a new column `__tmst` for each table, to persist the timestamp of each data row. But the sql you list seems not like our exact logic, so I want to see your accuracy rule statement to have a check. if your accuracy rule statement is like: "source.product_id = target.product_id, source.market_id = target.market_id, source.period_id = target.period_id", the sql for that step should be: "SELECT source.* FROM source LEFT JOIN target ON coalesce(source. product_id, 'null') = coalesce(target. product_id, 'null') and coalesce(source.market_id, 'null') = coalesce(target.market_id, 'null') and coalesce(source. period_id, 'null') = coalesce(target. period_id, 'null') WHERE (NOT (source.product_id IS NULL AND source.market_id IS NULL AND source. period_id IS NULL)) AND target.product_id IS NULL AND source.market_id IS NULL AND source. period_id IS NULL". Hope this can help you, thanks. -- Regards, Lionel, Liu At 2017-11-21 06:50:09, "William Guo" <gu...@apache.org> wrote: >hello Ananthanarayanan.M.S, > >We support some partitioned tables, could you show us your partitioned table >description for check and some sample data will be helpful? > >Thanks, >William >________________________________ >From: Ananthanarayanan Ms <ananthanarayanan.ms...@nielsen.com> >Sent: Tuesday, November 21, 2017 3:53 AM >To: William Guo >Cc: dev@griffin.incubator.apache.org; Abishek Kunduru >Subject: Re: Re: Griffin Accuracy Doubts > >Hello William / Lionel, >Could you please let us know if we can use 1.6. Could you please let know if >the partitioned tables alone can be input as src/tgt as below err is got when >we build and run the master code. > >17/11/20 14:51:13 ERROR engine.SparkSqlEngine: run spark sql [ SELECT `__tmst` >AS `__tmst`, source.product_id, source.market_id, >source.period_id,s5_volume_rom_distrib FROM source LEFT JOIN target ON >coalesce(source.period_id, 'null') = coalesce(target.period_id, 'null') and >coalesce(source.market_id, 'null') = coalesce(target.market_id, 'null') and >coalesce(source.product_id, 'null') = coalesce(target.product_id, 'null') >WHERE (NOT (source.product_id IS NULL )) GROUP BY `__tmst` ] error: Reference >'__tmst' is ambiguous, could be: __tmst#145L, __tmst#146L.; line 1 pos 413 > > > >Regards, >Ananthanarayanan.M.S > >On Wed, Oct 4, 2017 at 2:39 PM, William Guo ><gu...@apache.org<mailto:gu...@apache.org>> wrote: > >hi Abishek, > > >For SQL support, the version should be 0.1.6, we will release in Middle of >October. > > > >Thanks, > >William > > >________________________________ >From: Abishek Kunduru ><abishek.kund...@nielsen.com<mailto:abishek.kund...@nielsen.com>> >Sent: Tuesday, October 3, 2017 12:35 AM >To: 刘力力 >Cc: dev@griffin.incubator.apache.org<mailto:dev@griffin.incubator.apache.org>; >Vidyaranya Malladi; gu...@apache.org<mailto:gu...@apache.org>; Mara Preotescu; >Ananthanarayanan Ms >Subject: Re: Re: Griffin Accuracy Doubts > >Thanks William and Lionel for the quick response. Can you please tell us when >would the Griffin with SQL support be released and its version number? > >Regarding the second question, Vidya is not referring to changing the source >or target data but instead DQ feature to compare source and target columns and >output the difference in terms of percentage change (Vidya gave an example as >well in the below table which outputs the difference between source and target >numeric columns). > >Best, >Abishek > > >On Fri, Sep 29, 2017 at 10:13 PM, 刘力力 ><bhlx3l...@163.com<mailto:bhlx3l...@163.com>> wrote: >Hi Vidyaranya, > >I have some more comments about the answers from William. Pls check the >following comments. > > >1. Does Griffin plan to support multi table being src eg. src table frame from >a qry joined thro tabl1,tabl2,tabln and selected few cols and then finally use >that as src and similarly for destination as well > >--Yes, Since we have change our backend to fully support sql. > >2. Would Griffin support the result metric col wise instead of overall matched >count. >eg if src table is having 4 cols and dest is having 4 or more cols and user >wants to check the accuracy of col4 which is 'price' in src and destination by >having a new col added to the src table as '%change' which would be >abs(src_price - dest_price)/100 > >--Yes, Since we have change our backend to fully support sql. > >Fact_tag fact_sDesc fat_Ldesc SRC Fact_precision SRC >Data TGT Data % Chg >f000000000000000000100000000000000000000 Units Sales Units 1 > 25510.00 2551.00 0.00 >f000000000000000000100000000000000000001 Units (any promo) Sales >Units (any promo) 1 5846.00 584.59 0.00 >f000000000000000000100000000000000000002 Units (no promo) Sales >Units (no promo) 1 17172.00 1717.25 0.00 > > >--[commont from Lionel] Actually, I think the modification of data source is >like some ETL task, not the DQ domain, we consider that the DQ result from >griffin should be simple like a value or percentage or group of it, that might >be better for DQ check. > > >3. Support udfs on the cols before comparing, eg one of our src is a table >from mainframe format where dots/decimals are ignore and we have to apply a >logic to get the decimals > >--For UDFs, if it registered in your computing cluster, it should be fine from >command line. we are considering to refactor ui to support sql editor directly. >but how to provide better user experience from front end for UDF, we are still >considering it. > >4. Last time we discussed few features were said to be discussed to support >could you please let us know if we have any road map for the same. like >advanced sql features and adding groupby clause/udfs etc > >--Support SQL is already done, will have a quick release after we fully tested >and reviewed. > >5. Griffin compares the cols selected by using them in joins, do we plan to >enhance to check if the user could select few cols other than the mentioned in >joins and we need to compare the src col's value vs dest col's value lets say >without the join and only by order by clause. > > >select col from src order by prikey >select col from tgt order by prikey > >src.col vs tgt.col instead of the join key . Please let us know. > >-- For this case, it depends on how would you like to map src.col with >tgt.col, we don't need join key if you can elaborate mapping in sql since we >are fully supporting sql. > >--[comment from Lionel] We support sql directly, if you can describe your case >in sql, we can do that. > >Thanks very much. > > >-- >Regards, >Lionel, Liu > > >At 2017-09-30 08:23:39, "William Guo" ><guo...@outlook.com<mailto:guo...@outlook.com>> wrote: >>hi Vidyaranya, >> >> >>Please check my comments inline. >> >> >>Thanks, >> >>William >> >> >>________________________________ >>From: Vidyaranya Malladi >><vidyaranya.malladi...@nielsen.com<mailto:vidyaranya.malladi...@nielsen.com>> >>Sent: Friday, September 29, 2017 10:06 PM >>To: gu...@apache.org<mailto:gu...@apache.org>; >>guo...@outlook.com<mailto:guo...@outlook.com> >>Cc: >>dev@griffin.incubator.apache.org<mailto:dev@griffin.incubator.apache.org>; >>Mara Preotescu; Abishek Kunduru; Ananthanarayanan Ms >>Subject: Griffin Accuracy Doubts >> >>Dear William, >> Good day. Please let us know on the below questions that we have >> >>1. Does Griffin plan to support multi table being src eg. src table frame >>from a qry joined thro tabl1,tabl2,tabln and selected few cols and then >>finally use that as src and similarly for destination as well >> >>--Yes, Since we have change our backend to fully support sql. >> >>2. Would Griffin support the result metric col wise instead of overall >>matched count. >>eg if src table is having 4 cols and dest is having 4 or more cols and user >>wants to check the accuracy of col4 which is 'price' in src and destination >>by having a new col added to the src table as '%change' which would be >>abs(src_price - dest_price)/100 >> >>--Yes, Since we have change our backend to fully support sql. >> >>Fact_tag fact_sDesc fat_Ldesc SRC Fact_precision SRC >>Data TGT Data % Chg >>f000000000000000000100000000000000000000 Units Sales Units 1 >> 25510.00 2551.00 0.00 >>f000000000000000000100000000000000000001 Units (any promo) Sales >>Units (any promo) 1 5846.00 584.59 0.00 >>f000000000000000000100000000000000000002 Units (no promo) Sales >>Units (no promo) 1 17172.00 1717.25 0.00 >> >> >>3. Support udfs on the cols before comparing, eg one of our src is a table >>from mainframe format where dots/decimals are ignore and we have to apply a >>logic to get the decimals >> >>--For UDFs, if it registered in your computing cluster, it should be fine >>from command line. we are considering to refactor ui to support sql editor >>directly. >>but how to provide better user experience from front end for UDF, we are >>still considering it. >> >>4. Last time we discussed few features were said to be discussed to support >>could you please let us know if we have any road map for the same. like >>advanced sql features and adding groupby clause/udfs etc >> >>--Support SQL is already done, will have a quick release after we fully >>tested and reviewed. >> >>5. Griffin compares the cols selected by using them in joins, do we plan to >>enhance to check if the user could select few cols other than the mentioned >>in joins and we need to compare the src col's value vs dest col's value lets >>say without the join and only by order by clause. >> >> >>select col from src order by prikey >>select col from tgt order by prikey >> >>src.col vs tgt.col instead of the join key . Please let us know. >> >>-- For this case, it depends on how would you like to map src.col with >>tgt.col, we don't need join key if you can elaborate mapping in sql since we >>are fully supporting sql. >> >>Thank you very much. >> >> >> >>Thanks & Regards >>M. Vidyaranya >>NDX >>TCS - Nielsen Relationship >>Mobile : +91 9790591311 > > > >【网易自营|30天无忧退货】仅售同款价1/4!MUJI制造商“2017秋冬舒适家居拖鞋系列”限时仅34.9元>> ><http://you.163.com/item/detail?id=1165011&from=web_gg_mail_jiaobiao_9> > >