Hi Karan,

Griffin supports text-directory at current, you can use the "text-dir" type
data connector:
https://github.com/apache/incubator-griffin/blob/master/measure/src/main/scala/org/apache/griffin/measure/data/connector/batch/TextDirBatchDataConnector.scala

But for this data connector, it can only scan the files of the nth depth
sub-directories, the files are read as text formatted, which has no schema
either.
You can have a try with it, but I'm not sure it could cover your case.

The best way is to implement your own data connector, just refer to the one
I listed above, you can also add the function to read schema file, it's not
complicated.

Thanks,
Lionel


On Mon, May 28, 2018 at 5:08 PM, Karan Gupta <karan.gu...@tavant.com> wrote:

> Hi Lionel,
>
>
>
> The entry point for my data flow are csv files on which I want to run
> profiling jobs instead of hive tables. These csv files will be subjected to
> profiling and health check before moving them into the data flow. Such
> files will be on HDFS. Hence, I have couple of questions here
>
>
>
> Does profiling support files instead of hive tables? If yes, can I point
> my “data.source” to an HDFS directory instead of specifying a file each
> time, so that the griffin will run the profiling job on each newly added
> file in that HDFS location.
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
>
>
> *From:* Lionel Liu <lionel...@apache.org>
> *Sent:* Monday, May 28, 2018 1:58 PM
> *To:* Karan Gupta <karan.gu...@tavant.com>; dev@griffin.incubator.apache.
> org
> *Subject:* Re: Apache Griffin Profiling
>
>
>
> Hi Karan,
>
>
>
> Do you mean that you want to put your profiling config files in a HDFS
> directory, and let griffin scan the directory to get the config files at
> run time?
>
>
>
> Griffin measure module doesn't support this at current, you can refer to
> the code entrance and implement your own param file reader if you want to
> do that:
>
> https://github.com/apache/incubator-griffin/blob/master/
> measure/src/main/scala/org/apache/griffin/measure/Application.scala#L170
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2FApplication.scala%23L170&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=sU9FmRMia3p9BNwNo9Ou1kcEy1wo1BYlUNO8ry5KM6o%3D&reserved=0>
>
> https://github.com/apache/incubator-griffin/tree/master/
> measure/src/main/scala/org/apache/griffin/measure/config/reader
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Ftree%2Fmaster%2Fmeasure%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fgriffin%2Fmeasure%2Fconfig%2Freader&data=01%7C01%7Ckaran.gupta%40tavant.com%7Ca6d29fd95ec94504c97808d5c474f525%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=pme%2BSSV0lcw58WMHNrvwtl1vTSpBtD%2BajAnEzdzsoFk%3D&reserved=0>
>
>
>
> But in my opinion, maybe it's not appropriate to do such work in measure
> module. This seems like to be some schedule work before submitting griffin
> jobs.
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
> On Mon, May 28, 2018 at 3:21 PM, Karan Gupta <karan.gu...@tavant.com>
> wrote:
>
> Hi Lionel,
>
>
>
> Thank you for your response, I created a single custom rule for multiple
> sources. Now I am trying to run profiling jobs where my source is not
> tightly coupled inside a rule. I want to run profiling jobs by just
> pointing to a HDFS directory instead of a specific file <griffin should
> pick up the file name from the directory on run time>
> Is it possible to do that through Griffin?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
> ------------------------------
>
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>

Reply via email to