[
https://issues.apache.org/jira/browse/GRIFFIN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johnnie updated GRIFFIN-278:
----------------------------
Description:
Griffin data connector designed to compare the dataset's accuracy between
source and target.
However, in big data eco-system, most of the source is huge and will have
hundreds of files in one folder. I think it would be great if griffin can
handle the source by folder instead of a file by default.
In addition, in spark normally it reads data from a folder. in this case we
don't need to union all the files in one folder
was:
Griffin data connector designed to compare the dataset's accuracy between
source and target.
However, in big data eco-system, most of the source is huge and will have
hundreds of files in one folder. I think it would be great if griffin can
handle the source by folder instead of a file.
> AvroBatchDataConnector handle input is directory
> ------------------------------------------------
>
> Key: GRIFFIN-278
> URL: https://issues.apache.org/jira/browse/GRIFFIN-278
> Project: Griffin
> Issue Type: Improvement
> Reporter: Johnnie
> Priority: Major
>
> Griffin data connector designed to compare the dataset's accuracy between
> source and target.
> However, in big data eco-system, most of the source is huge and will have
> hundreds of files in one folder. I think it would be great if griffin can
> handle the source by folder instead of a file by default.
> In addition, in spark normally it reads data from a folder. in this case we
> don't need to union all the files in one folder
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)