Re: GSoC Ideas

2018-03-27 Thread Talat Uyarer
Thank you Kevin! you saved my life.

On Mon, Mar 26, 2018 at 10:14 PM, Kevin Ratnasekera  wrote:

> Hi Talat,
>
> Please make sure you submit a proposal before the deadline. Deadline is
> March 27 16:00 UTC.
>
> Regards
> Kevin
>
> On Fri, Mar 16, 2018 at 6:18 AM, lewis john mcgibbney 
> wrote:
>
>> Hi Talat,
>> In all honesty I don't have the same time I used to, to look into this.
>> I have been experimenting using Arrow with multi-dimensional array-based
>> data but nothing else.
>> I would therefore be learning probably as much as you if this project was
>> to go ahead.
>> Lewis
>>
>> On Thu, Mar 15, 2018 at 3:46 PM, Talat Uyarer  wrote:
>>
>>> @Lewis I found a PR[0] on Arrow Git repo. I guess they stuck with avro-c
>>> library. Do you know do they need implement same thing for all languages
>>> which are supported by them or they just need to implement a wrapper ?
>>>
>>> If we can use Arrow for our internal serialization, Gora will be super
>>> fast with zero copy support. :)
>>>
>>> [0] https://github.com/apache/arrow/pull/1026
>>>
>>> My 2 cent
>>>
>>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>>> lewi...@apache.org> wrote:
>>>
 Hi Renato,

 On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
 renatoj.marroq...@gmail.com> wrote:

> Hey guys,
>
> There might not be an integration/convertors of Arrow to Avro (and/or
> viceversa) because there are parquet readers that can take avro and once
> stuff is in parquet, then arrow can be used directly.
>

 Yes there might not be. I actually raised this issue [0] a wee while
 ago on the Arrow list. At that time I was told, "...The use case you
 outline makes a lot of sense for Arrow to help out with. We don't yet have
 an AVRO <> Arrow converter written but it is something that would be great
 to have." So maybe that would be something to keep in mind.

 [0] https://s.apache.org/2GwS


> Regarding if an integration of Parquet with Gora, I think it would be
> interesting to make it easier for people to read and write parquet files 
> by
> providing a higher level api as Gora provides. However, for you @Talat,
> that knows Gora pretty well, maybe you could take another project that
> helps Gora more. For example, fixing the integration with Nutch. There are
> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
> community.
> IMHO that should be GSOC project.
>

 ACK, other existing projects which consume Gora are (off the top of my
 head),

- Chukwa - https://s.apache.org/cW6a
- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
- Camel - https://camel.apache.org/gora.html
- Nutch 2.X - https://github.com/apache/nutch/tree/2.x

 An interesting idea I had where Gora could be implemented would be in
 Hadoop metrics

 https://hadoop.apache.org/docs/current/hadoop-project-dist/h
 adoop-common/Metrics.html

 This would provide provide a text book usage for Gora to store Hadoop
 metrics in some datastore which would then be exposed for query and
 analysis.

> I can't mentored it because I do not have enough insights on this, but
> @Lewis and @Talat you can probably tackle this as mentor and student. This
> would be an awesome contribution to the project as there are quite a lot 
> of
> people going over Nutch and trying to use it with Gora.
> Just my 2c
>
>
 Understood Renato, no biggie. Thanks for your input. I know you are
 working with Parquet alot these days so your input is appreciated.
 Lewis

>>>
>>>
>>>
>>> --
>>> Talat UYARER
>>> Websitesi: http://talat.uyarer.com
>>> Twitter: http://twitter.com/talatuyarer
>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>>
>>
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>


-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304


Re: GSoC Ideas

2018-03-26 Thread Kevin Ratnasekera
Hi Talat,

Please make sure you submit a proposal before the deadline. Deadline is
March 27 16:00 UTC.

Regards
Kevin

On Fri, Mar 16, 2018 at 6:18 AM, lewis john mcgibbney 
wrote:

> Hi Talat,
> In all honesty I don't have the same time I used to, to look into this.
> I have been experimenting using Arrow with multi-dimensional array-based
> data but nothing else.
> I would therefore be learning probably as much as you if this project was
> to go ahead.
> Lewis
>
> On Thu, Mar 15, 2018 at 3:46 PM, Talat Uyarer  wrote:
>
>> @Lewis I found a PR[0] on Arrow Git repo. I guess they stuck with avro-c
>> library. Do you know do they need implement same thing for all languages
>> which are supported by them or they just need to implement a wrapper ?
>>
>> If we can use Arrow for our internal serialization, Gora will be super
>> fast with zero copy support. :)
>>
>> [0] https://github.com/apache/arrow/pull/1026
>>
>> My 2 cent
>>
>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>> lewi...@apache.org> wrote:
>>
>>> Hi Renato,
>>>
>>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>>> renatoj.marroq...@gmail.com> wrote:
>>>
 Hey guys,

 There might not be an integration/convertors of Arrow to Avro (and/or
 viceversa) because there are parquet readers that can take avro and once
 stuff is in parquet, then arrow can be used directly.

>>>
>>> Yes there might not be. I actually raised this issue [0] a wee while ago
>>> on the Arrow list. At that time I was told, "...The use case you outline
>>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>>> <> Arrow converter written but it is something that would be great to
>>> have." So maybe that would be something to keep in mind.
>>>
>>> [0] https://s.apache.org/2GwS
>>>
>>>
 Regarding if an integration of Parquet with Gora, I think it would be
 interesting to make it easier for people to read and write parquet files by
 providing a higher level api as Gora provides. However, for you @Talat,
 that knows Gora pretty well, maybe you could take another project that
 helps Gora more. For example, fixing the integration with Nutch. There are
 multiple loose ends in Nutch 2.x and Gora that we have neglected as a
 community.
 IMHO that should be GSOC project.

>>>
>>> ACK, other existing projects which consume Gora are (off the top of my
>>> head),
>>>
>>>- Chukwa - https://s.apache.org/cW6a
>>>- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>>- Camel - https://camel.apache.org/gora.html
>>>- Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>>
>>> An interesting idea I had where Gora could be implemented would be in
>>> Hadoop metrics
>>>
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/h
>>> adoop-common/Metrics.html
>>>
>>> This would provide provide a text book usage for Gora to store Hadoop
>>> metrics in some datastore which would then be exposed for query and
>>> analysis.
>>>
 I can't mentored it because I do not have enough insights on this, but
 @Lewis and @Talat you can probably tackle this as mentor and student. This
 would be an awesome contribution to the project as there are quite a lot of
 people going over Nutch and trying to use it with Gora.
 Just my 2c


>>> Understood Renato, no biggie. Thanks for your input. I know you are
>>> working with Parquet alot these days so your input is appreciated.
>>> Lewis
>>>
>>
>>
>>
>> --
>> Talat UYARER
>> Websitesi: http://talat.uyarer.com
>> Twitter: http://twitter.com/talatuyarer
>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


Re: GSoC Ideas

2018-03-15 Thread lewis john mcgibbney
Hi Talat,
In all honesty I don't have the same time I used to, to look into this.
I have been experimenting using Arrow with multi-dimensional array-based
data but nothing else.
I would therefore be learning probably as much as you if this project was
to go ahead.
Lewis

On Thu, Mar 15, 2018 at 3:46 PM, Talat Uyarer  wrote:

> @Lewis I found a PR[0] on Arrow Git repo. I guess they stuck with avro-c
> library. Do you know do they need implement same thing for all languages
> which are supported by them or they just need to implement a wrapper ?
>
> If we can use Arrow for our internal serialization, Gora will be super
> fast with zero copy support. :)
>
> [0] https://github.com/apache/arrow/pull/1026
>
> My 2 cent
>
> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney  > wrote:
>
>> Hi Renato,
>>
>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>> viceversa) because there are parquet readers that can take avro and once
>>> stuff is in parquet, then arrow can be used directly.
>>>
>>
>> Yes there might not be. I actually raised this issue [0] a wee while ago
>> on the Arrow list. At that time I was told, "...The use case you outline
>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>> <> Arrow converter written but it is something that would be great to
>> have." So maybe that would be something to keep in mind.
>>
>> [0] https://s.apache.org/2GwS
>>
>>
>>> Regarding if an integration of Parquet with Gora, I think it would be
>>> interesting to make it easier for people to read and write parquet files by
>>> providing a higher level api as Gora provides. However, for you @Talat,
>>> that knows Gora pretty well, maybe you could take another project that
>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>> community.
>>> IMHO that should be GSOC project.
>>>
>>
>> ACK, other existing projects which consume Gora are (off the top of my
>> head),
>>
>>- Chukwa - https://s.apache.org/cW6a
>>- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>- Camel - https://camel.apache.org/gora.html
>>- Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>
>> An interesting idea I had where Gora could be implemented would be in
>> Hadoop metrics
>>
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/
>> hadoop-common/Metrics.html
>>
>> This would provide provide a text book usage for Gora to store Hadoop
>> metrics in some datastore which would then be exposed for query and
>> analysis.
>>
>>> I can't mentored it because I do not have enough insights on this, but
>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>> would be an awesome contribution to the project as there are quite a lot of
>>> people going over Nutch and trying to use it with Gora.
>>> Just my 2c
>>>
>>>
>> Understood Renato, no biggie. Thanks for your input. I know you are
>> working with Parquet alot these days so your input is appreciated.
>> Lewis
>>
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: GSoC Ideas

2018-03-15 Thread Renato Marroquín Mogrovejo
That also sounds like a great project Furkan!
Hopefully someone takes it. I will be happy to provide code reviews and
comments :)


Best,

Renato M.

2018-03-15 11:57 GMT+01:00 Furkan KAMACI :

> Hi Fellows,
>
> I'll also mentor a project this year and I can help for the topics already
> mentioned. I think that https://issues.apache.org/jira/browse/GORA-532 could
> be another issue for GSoC. Also, https://issues.apache.
> org/jira/browse/GORA-450 can be a warm-up issue for any GSoC projects
> which can I collaborate.
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Mar 15, 2018 at 12:59 PM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi Lewis,
>>
>> Thanks for pointing out [0]! I guess it makes sense, and there might be
>> some performance to be gained when doing the transformation directly from
>> Avro to Arrow.
>>
>> Yes, Lewis I totally agree with you in that having Gora to serialize all
>> Hadoop metrics would be an awesome project. Is that a project for GSoC
>> already?   Are you planning to mentor any projects?
>>
>> Also regarding this project integration topic, have you thought about
>> proving Any23 a way to read/write xml, html, json objects through Gora? Do
>> you think that would be an interesting project for the Any23 community?
>>
>>
>> Best,
>>
>> Renato M.
>>
>> 2018-03-15 8:26 GMT+01:00 lewis john mcgibbney :
>>
>>> I should also say, ALL of the projects below which I have named require
>>> the Gora dependency to be upgraded.
>>> Lewis
>>>
>>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>>> lewi...@apache.org> wrote:
>>>
 Hi Renato,

 On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
 renatoj.marroq...@gmail.com> wrote:

> Hey guys,
>
> There might not be an integration/convertors of Arrow to Avro (and/or
> viceversa) because there are parquet readers that can take avro and once
> stuff is in parquet, then arrow can be used directly.
>

 Yes there might not be. I actually raised this issue [0] a wee while
 ago on the Arrow list. At that time I was told, "...The use case you
 outline makes a lot of sense for Arrow to help out with. We don't yet have
 an AVRO <> Arrow converter written but it is something that would be great
 to have." So maybe that would be something to keep in mind.

 [0] https://s.apache.org/2GwS


> Regarding if an integration of Parquet with Gora, I think it would be
> interesting to make it easier for people to read and write parquet files 
> by
> providing a higher level api as Gora provides. However, for you @Talat,
> that knows Gora pretty well, maybe you could take another project that
> helps Gora more. For example, fixing the integration with Nutch. There are
> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
> community.
> IMHO that should be GSOC project.
>

 ACK, other existing projects which consume Gora are (off the top of my
 head),

- Chukwa - https://s.apache.org/cW6a
- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
- Camel - https://camel.apache.org/gora.html
- Nutch 2.X - https://github.com/apache/nutch/tree/2.x

 An interesting idea I had where Gora could be implemented would be in
 Hadoop metrics

 https://hadoop.apache.org/docs/current/hadoop-project-dist/h
 adoop-common/Metrics.html

 This would provide provide a text book usage for Gora to store Hadoop
 metrics in some datastore which would then be exposed for query and
 analysis.

> I can't mentored it because I do not have enough insights on this, but
> @Lewis and @Talat you can probably tackle this as mentor and student. This
> would be an awesome contribution to the project as there are quite a lot 
> of
> people going over Nutch and trying to use it with Gora.
> Just my 2c
>
>
 Understood Renato, no biggie. Thanks for your input. I know you are
 working with Parquet alot these days so your input is appreciated.
 Lewis

>>>
>>>
>>>
>>> --
>>> http://home.apache.org/~lewismc/
>>> http://people.apache.org/keys/committer/lewismc
>>>
>>
>>
>


Re: GSoC Ideas

2018-03-15 Thread Furkan KAMACI
Hi Fellows,

I'll also mentor a project this year and I can help for the topics already
mentioned. I think that https://issues.apache.org/jira/browse/GORA-532 could
be another issue for GSoC. Also,
https://issues.apache.org/jira/browse/GORA-450 can be a warm-up issue for
any GSoC projects which can I collaborate.

Kind Regards,
Furkan KAMACI

On Thu, Mar 15, 2018 at 12:59 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Lewis,
>
> Thanks for pointing out [0]! I guess it makes sense, and there might be
> some performance to be gained when doing the transformation directly from
> Avro to Arrow.
>
> Yes, Lewis I totally agree with you in that having Gora to serialize all
> Hadoop metrics would be an awesome project. Is that a project for GSoC
> already?   Are you planning to mentor any projects?
>
> Also regarding this project integration topic, have you thought about
> proving Any23 a way to read/write xml, html, json objects through Gora? Do
> you think that would be an interesting project for the Any23 community?
>
>
> Best,
>
> Renato M.
>
> 2018-03-15 8:26 GMT+01:00 lewis john mcgibbney :
>
>> I should also say, ALL of the projects below which I have named require
>> the Gora dependency to be upgraded.
>> Lewis
>>
>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>> lewi...@apache.org> wrote:
>>
>>> Hi Renato,
>>>
>>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>>> renatoj.marroq...@gmail.com> wrote:
>>>
 Hey guys,

 There might not be an integration/convertors of Arrow to Avro (and/or
 viceversa) because there are parquet readers that can take avro and once
 stuff is in parquet, then arrow can be used directly.

>>>
>>> Yes there might not be. I actually raised this issue [0] a wee while ago
>>> on the Arrow list. At that time I was told, "...The use case you outline
>>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>>> <> Arrow converter written but it is something that would be great to
>>> have." So maybe that would be something to keep in mind.
>>>
>>> [0] https://s.apache.org/2GwS
>>>
>>>
 Regarding if an integration of Parquet with Gora, I think it would be
 interesting to make it easier for people to read and write parquet files by
 providing a higher level api as Gora provides. However, for you @Talat,
 that knows Gora pretty well, maybe you could take another project that
 helps Gora more. For example, fixing the integration with Nutch. There are
 multiple loose ends in Nutch 2.x and Gora that we have neglected as a
 community.
 IMHO that should be GSOC project.

>>>
>>> ACK, other existing projects which consume Gora are (off the top of my
>>> head),
>>>
>>>- Chukwa - https://s.apache.org/cW6a
>>>- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>>- Camel - https://camel.apache.org/gora.html
>>>- Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>>
>>> An interesting idea I had where Gora could be implemented would be in
>>> Hadoop metrics
>>>
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/h
>>> adoop-common/Metrics.html
>>>
>>> This would provide provide a text book usage for Gora to store Hadoop
>>> metrics in some datastore which would then be exposed for query and
>>> analysis.
>>>
 I can't mentored it because I do not have enough insights on this, but
 @Lewis and @Talat you can probably tackle this as mentor and student. This
 would be an awesome contribution to the project as there are quite a lot of
 people going over Nutch and trying to use it with Gora.
 Just my 2c


>>> Understood Renato, no biggie. Thanks for your input. I know you are
>>> working with Parquet alot these days so your input is appreciated.
>>> Lewis
>>>
>>
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>


Re: GSoC Ideas

2018-03-15 Thread Renato Marroquín Mogrovejo
Hi Lewis,

Thanks for pointing out [0]! I guess it makes sense, and there might be
some performance to be gained when doing the transformation directly from
Avro to Arrow.

Yes, Lewis I totally agree with you in that having Gora to serialize all
Hadoop metrics would be an awesome project. Is that a project for GSoC
already?   Are you planning to mentor any projects?

Also regarding this project integration topic, have you thought about
proving Any23 a way to read/write xml, html, json objects through Gora? Do
you think that would be an interesting project for the Any23 community?


Best,

Renato M.

2018-03-15 8:26 GMT+01:00 lewis john mcgibbney :

> I should also say, ALL of the projects below which I have named require
> the Gora dependency to be upgraded.
> Lewis
>
> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney  > wrote:
>
>> Hi Renato,
>>
>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> Hey guys,
>>>
>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>> viceversa) because there are parquet readers that can take avro and once
>>> stuff is in parquet, then arrow can be used directly.
>>>
>>
>> Yes there might not be. I actually raised this issue [0] a wee while ago
>> on the Arrow list. At that time I was told, "...The use case you outline
>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>> <> Arrow converter written but it is something that would be great to
>> have." So maybe that would be something to keep in mind.
>>
>> [0] https://s.apache.org/2GwS
>>
>>
>>> Regarding if an integration of Parquet with Gora, I think it would be
>>> interesting to make it easier for people to read and write parquet files by
>>> providing a higher level api as Gora provides. However, for you @Talat,
>>> that knows Gora pretty well, maybe you could take another project that
>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>> community.
>>> IMHO that should be GSOC project.
>>>
>>
>> ACK, other existing projects which consume Gora are (off the top of my
>> head),
>>
>>- Chukwa - https://s.apache.org/cW6a
>>- Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>- Camel - https://camel.apache.org/gora.html
>>- Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>
>> An interesting idea I had where Gora could be implemented would be in
>> Hadoop metrics
>>
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/
>> hadoop-common/Metrics.html
>>
>> This would provide provide a text book usage for Gora to store Hadoop
>> metrics in some datastore which would then be exposed for query and
>> analysis.
>>
>>> I can't mentored it because I do not have enough insights on this, but
>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>> would be an awesome contribution to the project as there are quite a lot of
>>> people going over Nutch and trying to use it with Gora.
>>> Just my 2c
>>>
>>>
>> Understood Renato, no biggie. Thanks for your input. I know you are
>> working with Parquet alot these days so your input is appreciated.
>> Lewis
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


Re: GSoC Ideas

2018-03-15 Thread lewis john mcgibbney
Hi Renato,

On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey guys,
>
> There might not be an integration/convertors of Arrow to Avro (and/or
> viceversa) because there are parquet readers that can take avro and once
> stuff is in parquet, then arrow can be used directly.
>

Yes there might not be. I actually raised this issue [0] a wee while ago on
the Arrow list. At that time I was told, "...The use case you outline makes
a lot of sense for Arrow to help out with. We don't yet have an AVRO <>
Arrow converter written but it is something that would be great to have."
So maybe that would be something to keep in mind.

[0] https://s.apache.org/2GwS


> Regarding if an integration of Parquet with Gora, I think it would be
> interesting to make it easier for people to read and write parquet files by
> providing a higher level api as Gora provides. However, for you @Talat,
> that knows Gora pretty well, maybe you could take another project that
> helps Gora more. For example, fixing the integration with Nutch. There are
> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
> community.
> IMHO that should be GSOC project.
>

ACK, other existing projects which consume Gora are (off the top of my
head),

   - Chukwa - https://s.apache.org/cW6a
   - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
   - Camel - https://camel.apache.org/gora.html
   - Nutch 2.X - https://github.com/apache/nutch/tree/2.x

An interesting idea I had where Gora could be implemented would be in
Hadoop metrics

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Metrics.html

This would provide provide a text book usage for Gora to store Hadoop
metrics in some datastore which would then be exposed for query and
analysis.

> I can't mentored it because I do not have enough insights on this, but
> @Lewis and @Talat you can probably tackle this as mentor and student. This
> would be an awesome contribution to the project as there are quite a lot of
> people going over Nutch and trying to use it with Gora.
> Just my 2c
>
>
Understood Renato, no biggie. Thanks for your input. I know you are working
with Parquet alot these days so your input is appreciated.
Lewis


Re: GSoC Ideas

2018-03-14 Thread Renato Marroquín Mogrovejo
Hey guys,

There might not be an integration/convertors of Arrow to Avro (and/or
viceversa) because there are parquet readers that can take avro and once
stuff is in parquet, then arrow can be used directly.
Regarding if an integration of Parquet with Gora, I think it would be
interesting to make it easier for people to read and write parquet files by
providing a higher level api as Gora provides. However, for you @Talat,
that knows Gora pretty well, maybe you could take another project that
helps Gora more. For example, fixing the integration with Nutch. There are
multiple loose ends in Nutch 2.x and Gora that we have neglected as a
community.
IMHO that should be GSOC project. I can't mentored it because I do not have
enough insights on this, but @Lewis and @Talat you can probably tackle this
as mentor and student. This would be an awesome contribution to the project
as there are quite a lot of people going over Nutch and trying to use it
with Gora.
Just my 2c


Renato M.

2018-03-13 17:23 GMT+01:00 lewis john mcgibbney :

> Hi Talat,
>
> On Tue, Mar 13, 2018 at 9:07 AM, Talat Uyarer  wrote:
>
>> Hi Lewis,
>> Yes I checked the  Jira i saw Redis and Apache Ignite integration. I just
>> wanted to ask you guys how more curial issues.
>>
>> What about Apache Arrow ? Most of projects start using that. I have one
>> concern about it. Parquet and Arrow both of them are columnar. Avro provide
>> us flexibility.
>> [1] https://arrow.apache.org/
>>
>>
>> I agree. There is certainly some thought which needs to go into
> integration of these formats. Note, that Arrow does not have an Avro
> integration yet, so that would be indicative that the data modeling is non
> trivial or else I suspect it would have already been done.
> @Renato, you have some thoughts on this?
> Lewis
>


Re: GSoC Ideas

2018-03-13 Thread lewis john mcgibbney
Hi Talat,

On Tue, Mar 13, 2018 at 9:07 AM, Talat Uyarer  wrote:

> Hi Lewis,
> Yes I checked the  Jira i saw Redis and Apache Ignite integration. I just
> wanted to ask you guys how more curial issues.
>
> What about Apache Arrow ? Most of projects start using that. I have one
> concern about it. Parquet and Arrow both of them are columnar. Avro provide
> us flexibility.
> [1] https://arrow.apache.org/
>
>
> I agree. There is certainly some thought which needs to go into
integration of these formats. Note, that Arrow does not have an Avro
integration yet, so that would be indicative that the data modeling is non
trivial or else I suspect it would have already been done.
@Renato, you have some thoughts on this?
Lewis


Re: GSoC Ideas

2018-03-13 Thread Talat Uyarer
Hi Lewis,
Yes I checked the  Jira i saw Redis and Apache Ignite integration. I just
wanted to ask you guys how more curial issues.

What about Apache Arrow ? Most of projects start using that. I have one
concern about it. Parquet and Arrow both of them are columnar. Avro provide
us flexibility.
[1] https://arrow.apache.org/

Talat

On Mon, Mar 12, 2018 at 7:01 PM, lewis john mcgibbney 
wrote:

> Hi Talat,
> Head over to JIRA and look for GORA issues tagged with ‘gsoc2018’
>
> Another issue I could potential really think of would be to implement
> parquet as a potential underlying SerDe implementation alongside Avro.
> Lewis
>
> Mon, Mar 12, 2018 at 14:43 Talat Uyarer  wrote:
>
>> Hi All,
>>
>> I have a chance apply to GSoC in this year. I want to spend my time on
>> Gora. Do you have any suggestion for GSoC ?
>>
>> Thanks
>>
>>
>> --
>> Talat UYARER
>> Websitesi: http://talat.uyarer.com
>> Twitter: http://twitter.com/talatuyarer
>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304


Re: GSoC Ideas

2018-03-12 Thread lewis john mcgibbney
Hi Talat,
Head over to JIRA and look for GORA issues tagged with ‘gsoc2018’

Another issue I could potential really think of would be to implement
parquet as a potential underlying SerDe implementation alongside Avro.
Lewis

Mon, Mar 12, 2018 at 14:43 Talat Uyarer  wrote:

> Hi All,
>
> I have a chance apply to GSoC in this year. I want to spend my time on
> Gora. Do you have any suggestion for GSoC ?
>
> Thanks
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc