Re: ComPE-2021: Call for Papers and Nominations for Awards (Deadline: 30 July 2021)

2021-07-16 Thread Rahul Ravindran
Remove me too.

> On Jul 16, 2021, at 1:48 PM, Prajwal Nagaraj  wrote:
> 
> I don't even know who you guys are, I'm nowhere near Rajasthan someone 
> must've sent you a wrong email address.
> Please remove me from your mailing list. 
> 
> 
> Thank you
> 
> On Fri, 16 Jul, 2021, 1:45 pm Parth Patpatiya,  > wrote:
> Dear sir. 
> 
> My personal experience is very awkward regarding your ignoranant behaviour 
> and untimely response to the emails. I have been repeatedly trying to contact 
> you for the acceptance in the scopus books. 
> 
> 
> 
> Parth Patpatiya
> Assist Professor
> Banasthali Vidyapith
> Rajasthan 304022
> 
> On Fri, 16 Jul, 2021, 1:20 pm Dr. J. K. Verma,  > wrote:
> 2nd IEEE IAS ComPE 2021
> (2nd IEEE IAS INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE 
> EVALUATION)
> 
> /*Apologies if you received it multiple times. */
> 
> Conference Host: North-Eastern Hill University (Central University), 
> Shillong, India
> 
> Deadline for paper submission: July 30, 2021
> 
> Conference Date: December 1st-3rd, 2021
> 
> Conference Mode: ONLINE
> 
> IEEE Technical Co-Sponsor: IEEE Industry Applications Society USA
> 
> Financial Sponsor: Blue Amber Foundation, New Delhi, India
> 
> Paper Submission Link: https://easychair.org/conferences/?conf=compe2021 
> 
> 
> IEEE Paper Template: https://compe2021.com/author-guidelines/ 
> 
> 
> Conference Website: https://compe2021.com 
> 
> Keynote Speakers: https://compe2021.com/speakers/ 
> 
> 
> ===
> 
> REGISTRATION FEE: https://compe2021.com/registration-details/ 
> 
> 
> CALL FOR PAPERS: https://compe2021.com/regular-sessions/ 
> 
> 
> NOMINATIONS FOR AWARDS: https://compe2021.com/award-categories/ 
> 
> 
> CALL FOR SPECIAL SESSIONS: https://compe2021.com/special-sessions/ 
> 
> 
> CALL FOR SESSION CHAIR/REVIEWER: https://compe2021.com/tpc-member/ 
> 
> 
> Original contributions based on the results of research and developments are 
> solicited. Prospective authors are requested to submit their papers in not 
> more than 6 pages, prepared in the two column IEEE format. Submissions must 
> be plagiarism free and submission must follow the IEEE Policy on Double 
> Submission.
> 
> Selected high-quality papers will be eligible for Submission to IEEE 
> Transaction on Industrial Applications (IF*3.488, SCIE Indexed) and IEEE 
> Industry Applications Magazine (IF*1.093, SCIE Indexed) for further review 
> (subject to IAS related Tracks). Second tier of quality papers will be asked 
> for extension to publish with the Special Issue on SCI, Scopus Indexed 
> Journals.
> 
> All the accepted and presented papers will be eligible for submission to IEEE 
> HQ for publication in the form of e-proceedings in IEEE Xplore which is 
> indexed to world's leading Abstracting & Indexing (A) databases, including 
> ISI / SCOPUS/ DBLP/ EI-Compendex / Google Scholar.
> 
> IEEE IAS ComPE 2021 is a non- profit conference and it will provide an 
> opportunity to the practicing engineers, academicians and researchers to meet 
> in a forum to discuss various issues and its future direction in the field of 
> Electrical, Computer & Electronics Engineering and Technologies, Biomedical 
> Engineering and Interdisciplinary research. The conference aims to put 
> together the experts from the relevant areas to disseminate their knowledge 
> and experience for the relevant future research scope. The conference is 
> Technically Co-Sponsored by IEEE Industry Application Society USA and 
> Financially sponsored by Blue Amber Foundation, New Delhi. There are 11 
> tracks in the conference covering almost all areas of Electronics, Computer & 
> Electrical Engineering, Biomedical Engineering and Interdisciplinary research.
> 
> I will be grateful to you if you kindly forward this CFP to your research 
> group and academic contacts.
> 
> Thanking you,
> .
> Best Regards,
> Dr. Jitendra Kumar Verma
> (Conference Chair)
> Department of Computer Science & Engineering,
> Amity University, Gurugram, India
> Email: cont...@compe2021.com 
> Website: https://www.jk-verma.com/ 
> Conference Webpage: https://compe2021.com/ 
> 
> 
> To unsubscribe from the COMPE list, click the following link: 
> https://listserv.ieee.org/cgi-bin/wa?SUBED1=COMPE=1 
> 

Re: [google-appengine] Re: appcfg shutdown: earlier than scheduled?

2020-05-19 Thread 'Rahul Ravindran' via Google App Engine
And please send any app_ids privately to me since this list is public.

On Tue, May 19, 2020 at 4:32 PM Rahul Ravindran  wrote:

> Could you send the app-id for apps which are having trouble deploying via
> appcfg?
>
> On Tue, May 19, 2020 at 12:11 PM Linus Larsen 
> wrote:
>
>> I just tried updating another service (I'm using Java) which now fails:
>>
>> 98% Application deployment failed. Message: Deployments using appcfg are
>> no longer supported. See
>> https://cloud.google.com/appengine/docs/deprecations
>>
>> But I can update other services, seriously Google what's going on?
>>
>> Deprecated in July 30, then it's extended to August 30, but it seems you
>> have decided to cut the support for appcfg now?
>>
>> It's still at least 2 month away, come on!.
>>
>> / Linus
>>
>>
>>
>> On Tuesday, May 19, 2020 at 4:47:15 PM UTC+2, PPG Apps wrote:
>>>
>>> I am using appcfg and have had no issues
>>>
>>> App Engine SDK
>>> release: "1.9.87"
>>> timestamp: 1571856179
>>> api_versions: ['1']
>>> supported_api_versions:
>>> python:
>>> api_versions: ['1']
>>> python27:
>>> api_versions: ['1']
>>> go:
>>> api_versions: ['go1', 'go1.9']
>>> java7:
>>> api_versions: ['1.0']
>>> go111:
>>> api_versions: [null]
>>> Python 2.5.2
>>> wxPython 2.8.8.1 (msw-unicode)
>>>
>>>
>>> On Thursday, May 14, 2020 at 4:45:56 AM UTC-4, OferR wrote:
>>>>
>>>>
>>>> Thanks Linus,
>>>>
>>>> 1. I didn't receive this email. I get other mail from the app engine
>>>> people, but not this one.
>>>>
>>>> 2. Google? Why am I getting this error now?
>>>>
>>>> 3. Can anybody else confirm that they can/cannot deploy using appcfg?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> On Thursday, May 14, 2020 at 7:29:49 PM UTC+12, Linus Larsen wrote:
>>>>>
>>>>> I don't know about you guys, but I got this in the mail:
>>>>>
>>>>> The legacy standalone App Engine SDK (appcfg) was deprecated as of *July
>>>>> 30, 2019*, in favor of the GA Cloud SDK
>>>>> <https://www.google.com/appserve/mkt/p/AM7kBiUYJrHta2waHnldNTcQ5ec6046vFvMmt8TRDBvxD7fkID713CKb9cOc-AI46tCJQDF5RFyv_zlz4Sh5vfOdM7c>.
>>>>> You must migrate your projects off the legacy standalone SDK (appcfg) by
>>>>> August 30, 2020. The migration deadline was extended from July 30, 2020, 
>>>>> to
>>>>> avoid service disruption.
>>>>>
>>>>> The deadline has been extended, why you get this error now is a
>>>>> mystery to me.
>>>>>
>>>>> / Linus
>>>>>
>>>>> Den torsdag 14 maj 2020 kl. 05:19:39 UTC+2 skrev OferR:
>>>>>>
>>>>>>
>>>>>> Thanks for your comment.
>>>>>>
>>>>>> The document that you point to and other documents clearly suggest
>>>>>> that support for appcfg will be removed on July 30, 2020, which implies
>>>>>> that it should still be supported now.
>>>>>>
>>>>>> However, this does not seem to be the case as evident when trying to
>>>>>> deploy a new version to GAE using appcfg now.
>>>>>>
>>>>>> Can you please comment on this point.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thursday, May 14, 2020 at 2:56:38 PM UTC+12, Aref Amiri (Cloud
>>>>>> Platform Support) wrote:
>>>>>>>
>>>>>>> Based on this public documentation
>>>>>>> <https://cloud.google.com/appengine/docs/standard/java/sdk-gcloud-migration>,
>>>>>>> the appcfg tool which is included in Stand alone App Engine SDK, is
>>>>>>> depricated as of July 30, 2019 and is replaced by Cloud SDK
>>>>>>> <https://cloud.google.com/sdk/docs>. It will become unavailable for
>>>>>>> download on July 30, 2020.
>>>>>>>
>>>>>>> You may want to follow this documentation
>>>>>>> <https://cloud.google.com/appengine/docs/standard/java/tools/migrating-from-appcfg-to-gcloud>
>>>>>>> as it lists the equivalanet commands to some frequently used AppCfg
>>>>>>> commands.
>>>>>>>
>>>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to google-appengine+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/google-appengine/3b1c9155-5561-489a-aa42-d4a3cb6608f0%40googlegroups.com
>> <https://groups.google.com/d/msgid/google-appengine/3b1c9155-5561-489a-aa42-d4a3cb6608f0%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
> --
> Sent from my phone. Excuse the typos.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H2GWSsmxm0o4z22YVSsVJihjk_QTRFqPsbDAcPqp0gyOw%40mail.gmail.com.


Re: [google-appengine] Re: appcfg shutdown: earlier than scheduled?

2020-05-19 Thread 'Rahul Ravindran' via Google App Engine
Could you send the app-id for apps which are having trouble deploying via
appcfg?

On Tue, May 19, 2020 at 12:11 PM Linus Larsen 
wrote:

> I just tried updating another service (I'm using Java) which now fails:
>
> 98% Application deployment failed. Message: Deployments using appcfg are
> no longer supported. See
> https://cloud.google.com/appengine/docs/deprecations
>
> But I can update other services, seriously Google what's going on?
>
> Deprecated in July 30, then it's extended to August 30, but it seems you
> have decided to cut the support for appcfg now?
>
> It's still at least 2 month away, come on!.
>
> / Linus
>
>
>
> On Tuesday, May 19, 2020 at 4:47:15 PM UTC+2, PPG Apps wrote:
>>
>> I am using appcfg and have had no issues
>>
>> App Engine SDK
>> release: "1.9.87"
>> timestamp: 1571856179
>> api_versions: ['1']
>> supported_api_versions:
>> python:
>> api_versions: ['1']
>> python27:
>> api_versions: ['1']
>> go:
>> api_versions: ['go1', 'go1.9']
>> java7:
>> api_versions: ['1.0']
>> go111:
>> api_versions: [null]
>> Python 2.5.2
>> wxPython 2.8.8.1 (msw-unicode)
>>
>>
>> On Thursday, May 14, 2020 at 4:45:56 AM UTC-4, OferR wrote:
>>>
>>>
>>> Thanks Linus,
>>>
>>> 1. I didn't receive this email. I get other mail from the app engine
>>> people, but not this one.
>>>
>>> 2. Google? Why am I getting this error now?
>>>
>>> 3. Can anybody else confirm that they can/cannot deploy using appcfg?
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Thursday, May 14, 2020 at 7:29:49 PM UTC+12, Linus Larsen wrote:

 I don't know about you guys, but I got this in the mail:

 The legacy standalone App Engine SDK (appcfg) was deprecated as of *July
 30, 2019*, in favor of the GA Cloud SDK
 .
 You must migrate your projects off the legacy standalone SDK (appcfg) by
 August 30, 2020. The migration deadline was extended from July 30, 2020, to
 avoid service disruption.

 The deadline has been extended, why you get this error now is a mystery
 to me.

 / Linus

 Den torsdag 14 maj 2020 kl. 05:19:39 UTC+2 skrev OferR:
>
>
> Thanks for your comment.
>
> The document that you point to and other documents clearly suggest
> that support for appcfg will be removed on July 30, 2020, which implies
> that it should still be supported now.
>
> However, this does not seem to be the case as evident when trying to
> deploy a new version to GAE using appcfg now.
>
> Can you please comment on this point.
>
> Thanks
>
>
>
> On Thursday, May 14, 2020 at 2:56:38 PM UTC+12, Aref Amiri (Cloud
> Platform Support) wrote:
>>
>> Based on this public documentation
>> ,
>> the appcfg tool which is included in Stand alone App Engine SDK, is
>> depricated as of July 30, 2019 and is replaced by Cloud SDK
>> . It will become unavailable for
>> download on July 30, 2020.
>>
>> You may want to follow this documentation
>> 
>> as it lists the equivalanet commands to some frequently used AppCfg
>> commands.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/3b1c9155-5561-489a-aa42-d4a3cb6608f0%40googlegroups.com
> 
> .
>
-- 
Sent from my phone. Excuse the typos.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H0-Qnw%2BVDgq27J5G9%2Bh_dpbLhMUBMDPH5W7AHYs160smA%40mail.gmail.com.


Re: [google-appengine] Re: Getting GAE to use GZIP encoding in responses

2019-04-04 Thread 'Rahul Ravindran' via Google App Engine
We consider jpeg, mpeg and some file formats as not compressible and hence
do not compress this content-type

~Rahul.

On Thu, Apr 4, 2019 at 4:26 PM Joshua Smith 
wrote:

> I didn’t get an answer in either place. But my experience has been that
> this list tends to produce answers whereas SO is just asking into the void.
>
> On Apr 4, 2019, at 5:42 PM, 'Nicolas (Google Cloud Platform Support)' via
> Google App Engine  wrote:
>
>
> Hi Joshua,
>
> Thank you for posting here however I can see on your StackOverflow thread
> that the issue is resolved for you but would like to know the root cause of
> your issue.
>
> As this seems to be a bit more of a technical question you will probably
> have better answers from the community by posting on StackOverflow as
> Google Groups is intended for general discussion.
>
>
> On Thursday, March 14, 2019 at 9:05:48 AM UTC-4, Joshua Smith wrote:
>>
>> Is there a trick to convince GAE to GZIP responses? The docs say it’s
>> supposed to be automatic, but it isn’t happening. Details in this SO
>> question…
>>
>>
>> https://stackoverflow.com/questions/55145439/how-can-i-serve-gzip-encoded-images-from-google-app-engine
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/911a8289-5997-4e48-824b-8f2ea5264fb2%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/DF8EF245-23E6-4E02-954A-1A5A72BF6E15%40gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H0qbf09UdP6BLCCdP3M7%2BuLQP7_BQM8vgKkHkuj_GVh3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Quickstart for Python 3 in the App Engine Standard Environment

2019-03-19 Thread 'Rahul Ravindran' via Google App Engine
Looks like a typo. Could you try app-engine-Python?
  I will file a bug to fix the doc

On Tue, Mar 19, 2019 at 10:01 PM Will H  wrote:

> In the quickstart here:
> https://cloud.google.com/appengine/docs/standard/python3/quickstart
> There is a step to install the gcloud component app-engine-python3
>
> cloud components install app-engine-python3
>
> Apparently, this component does not exist though (at least that is the
> error I receive).
>
> Is this is a typo or is there something I need to do so that gcloud
> recognizes the component?
>
> I have confirmed I have the latest version of gcloud, version 239.0.0
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/d1c48192-212d-48ad-9636-4fe75a4087cb%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H1_H9BUY7hd0ycSnZPq%2BuBPYWw52c-BBCs0aHeQ%3DyaXfg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Existing ndb data -> Python 3 data

2019-02-12 Thread 'Rahul Ravindran' via Google App Engine
That is the idea. I encourage you to participate in the early releases etc
to ensure your use case is being met. You may have additional steps to
enable caching.

On Tue, Feb 12, 2019 at 5:30 PM Bruce Sherwood 
wrote:

> That is very good news indeed. It's not immediately obvious from that
> site's Migration document whether the new ndb library is intended just to
> be something that will make it easy for someone familiar with the existing
> ndb library to build a new GAE, or rather that the intent is to make it
> possible for people like me to access an existing ndb database in an old
> GAE whose Python 2.7's server is replaced by a Python 3 server. Can I
> assume that one of the goals is to be able to use an old ndb database?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/f4d2ee37-3714-4af9-88f7-5ba6f68b56b7%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H2Bww_2%3DbwW%2Bj25XDyTzGJNfPYWHAoLTesKEKcj%2B%2BAVGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Existing ndb data -> Python 3 data

2019-02-12 Thread 'Rahul Ravindran' via Google App Engine
Development of the new Python 3-compatible ndb client is happening in the
Google Cloud Python client library github repo at
https://github.com/googleapis/google-cloud-python/tree/master/ndb . The
library is not usable as-is yet, but work is in progress and can be
monitored there.

On Tue, Feb 12, 2019 at 3:47 PM Bruce Sherwood 
wrote:

> I've seen the documentation on Python 3 datastores, but what I haven't
> seen is how to deal with preserving existing user data.
>
> My existing GAE (Python 2.7, standard environment, ndb) has 60,000 user
> records. Perhaps one way to preserve this data would be to add code to my
> current Python server to dump the data in a format I understand, switch GAE
> to use the new datastore mechanism, then have the server upload my data and
> place the data into the new datastore.
>
> However, I'm surely not the only person facing this challenge, so is there
> some other/better approach?
>
> Bruce
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/e1ab0052-3f6a-4d41-a98c-4a88c5d42adf%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H0FZiModjCZOVW29agF3rPEgbXQYCETBmjwrGcF5w1pWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: GAE python 2.7 end of life

2019-01-09 Thread 'Rahul Ravindran' via Google App Engine
Google has a policy of a one year deprecation policy for any GA runtime.
Given that nothing has been announced yet, please know that your
application will continue running for at least a year and that will be
the *minimum
*period for you to need to do anything.

I apologize on being very brief around this at this point, but there are
discussions going on. Stay tuned for longer term guidance around this issue.

~Rahul.

On Wed, Jan 9, 2019 at 6:47 AM bFlood  wrote:

> its not the incompatibility of python (language/runtime), its missing
> services with regard to GAE Standard 2.7 and GAE Standard 3.0. (memcache,
> NDB, Search, Images, Users, webapp2, etc). Are these going to be duplicated
> in GAE3? and if so, will existing data and model definitions in the 2.7
> datastore work in 3?
>
> it looks like some work is being done for this but it would be great to
> know officially what Google plans for this upgrade process. how much code
> will need to change?
>
> https://github.com/googleapis/google-cloud-python/issues?q=NDB+sort%3Aupdated-desc
>
> also, generally what do you mean by "not directly affected on an immediate
> time frame"? a year, 2 years, 5 years before 2.7 apps stop running?
>
>
>
>
> On Wednesday, January 9, 2019 at 9:32:36 AM UTC-5, George (Cloud Platform
> Support) wrote:
>>
>> You are perfectly right, NP. Python 3 is made incompatible with 2, and
>> Python 3 does not support quite a few features, and will require effort to
>> re-program your app in an alternative manner. I was simply saying that
>> often the effort is not enormous, and becomes worthwhile on the longer
>> term.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/8f45b06f-0852-4e46-9331-c3ebdbe3e481%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H1sRf-AeJzem32fzKMKfDXe2wHw1OYnzQirmtSf2BSQ%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Python 3.7 service killed after exceeding memory limit

2018-11-20 Thread 'Rahul Ravindran' via Google App Engine
Hello,
  Your measurement of your application on your laptop does not accurately
represent all the memory used. Firstly, you will need to look at the RSS
memory for the process. In addition, any resources taken by the operating
system, kernel are not accounted for in your measurement but is accounted
in ours. We are not running a Python process alone, but giving you a
complete, isolated linux runtime environment in addition to the python
runtime environment.

~Rahul.

On Tue, Nov 20, 2018 at 2:17 PM vvv vvv  wrote:

> BTW this is in the Standard environment
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/12f271c6-4eb9-4fea-a022-2bb581471e14%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H1gUYghBUuVcS%2BHLSqtKGvMaT8ik%2B5gmVGMe%2BF8z3U%3D7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Python 3.7 and Dajngo 2.x

2018-11-10 Thread 'Rahul Ravindran' via Google App Engine
Not an exact match, but close - Here is a sample with Django, python 3.7
and cloudsql


On Sat, Nov 10, 2018 at 4:21 PM Charles tenorio 
wrote:

> Is anyone using django 2.0 like App Engine and python 3.7 and Cloud
> Datastore? if you can send me an example of CRUD! Thank you
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/ad46b3e1-130a-41f2-9356-77b5ce77ed35%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H33kEBbvChUhVxpyCM%3Dhdsg-8s%3D4fFkuu_abmY7_40UgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Persistent instance and out of memory with automatic scaling and no traffic

2018-10-09 Thread 'Rahul Ravindran' via Google App Engine
The instance might stay alive after it's been idle for 15 minutes, but you
won't be billed for it. Billing is based on 15 min blocks as long as there
is at least one active requet in the 15 min block.

We kill clones lazily to prevent excessive cold starts.


On Tue, Oct 9, 2018 at 5:22 AM vvv vvv  wrote:

> I am on the AppEngine Standard Python 3.7 environment. I have set up a
> cron job to execute every two hours. This cron job creates a bunch of
> threads, they execute some task and finish after some seconds. My web
> service receives no traffic other than this currently.
>
> I am attaching the number of instances graph, and the memory usage. In the
> instances graph the blue line shows that an instance runs for 15 minutes
> every two hours, which is what one would expect with automatic scaling and
> scaling to 0 instances. The green line shows that there are two instances
> running persistently, with the only exception at 23:00 when one instance
> gets killed because it goes out of memory, as one can see from the memory
> graph attached.
>
> My question is, why is there one instance running all the time? Who has
> created it? I'm trying to sort out my memory issue. Thanks a lot.
> P.S attached utilization graph to rule out the possibility that my
> instance is still alive after 15 minutes.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/44af4256-0b24-46fd-b971-515b1944c3de%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H13A4CD1FB5WYu6X%3DCfB%3D_sSBB%3DYr8V1iDNevR9fffG8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Can't run sample applications on AppEngine Stanadard Python3.7

2018-09-28 Thread 'Rahul Ravindran' via Google App Engine
Did you have a chance to look at
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/datastore/cloud-client
?

On Fri, Sep 28, 2018 at 10:50 AM vvv vvv  wrote:

> Hi George, thanks for answering. dev_appserver.py is for the standard
> environment Python 2.7, I'm trying to run a Python 3.7 web app.
>
> El viernes, 28 de septiembre de 2018, 1:28:34 (UTC+1), vvv vvv escribió:
>>
>>
>> Hello, after installing everything, I am trying to run
>> * 
>> ~/python-docs-samples/appengine/standard_python37/building-an-app/building-an-app-2*
>> *building-an-app/building-an-app-1* built and run successfully, my
>> problem is with the rest in the series which use the cloud datastore.
>> When i start my virtual environment and run
>> pip3 install -r requirements.txt
>>
>> python3 main.py
>>
>>
>> I get Traceback (most recent call last):
>>   File "main.py", line 20, in 
>> from google.cloud import datastore
>>   File
>> "/home/neptune/env/local/lib/python3.6/site-packages/google/cloud/datastore/__init__.py"
>> , line 61, in 
>> from google.cloud.datastore.batch import Batch
>>   File
>> "/home/neptune/env/local/lib/python3.6/site-packages/google/cloud/datastore/batch.py"
>> , line 24, in 
>> from google.cloud.datastore import helpers
>>   File
>> "/home/neptune/env/local/lib/python3.6/site-packages/google/cloud/datastore/helpers.py"
>> , line 24, in 
>> from google.type import latlng_pb2
>> ModuleNotFoundError: No module named 'google.type'
>>
>>
>> If on the other hand I run:
>>
>> pip install -r requirements.txt
>>
>> python main.py
>>
>>
>> The output is:
>>   File "main.py", line 22, in 
>> datastore_client = datastore.Client()
>> ...
>> psep = app_id.find(_PARTITION_SEPARATOR)
>> AttributeError: 'NoneType' object has no attribute 'find'
>>
>>
>> I found here (
>> https://github.com/GoogleCloudPlatform/google-cloud-datastore/issues/168)
>> in the end that I have to set APPLICATION_ID. How do I do this, and is this
>> documented somewhere?
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/5d3c5a27-d2d1-4619-9b47-4df1378828a3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H2M-m1%3Dm7URpZKdmXh8k0HSccLr5S9vWCcm-rkTsFnD-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] using 3rd party libraries on GAE standard with Python

2018-09-25 Thread 'Rahul Ravindran' via Google App Engine
Unfortunately, dev_appserver does not yet work with Python 3.x(See
https://cloud.google.com/appengine/docs/standard/python3/testing-and-deploying-your-app#local-dev-server).
You need to run it from a virtualenv which is running python 2.7.

Alternatively, as specified in
https://cloud.google.com/appengine/docs/standard/python3/testing-and-deploying-your-app,
you can choose to not use dev_appserver at all.

On Tue, Sep 25, 2018 at 2:41 PM Dewey Gaedcke  wrote:

> $ make serve
>
> export
> CLOUDSDK_PYTHON=/Users/dgaedcke/dev/client/nmg/nmg_payments_api/env/bin
>
> dev_appserver.py --clear_datastore 0 --logs_path=/tmp/gaelogs
> --log_level=warning \
>
> --host 0.0.0.0 app.yaml
>
> ERROR: Python 3 and later is not compatible with the Google Cloud SDK.
> Please use Python version 2.7.x.
>
>
> If you have a compatible Python interpreter installed, you can use it by
> setting
>
> the CLOUDSDK_PYTHON environment variable to point to it.
>
>
>
> gcloud --version
>
> Google Cloud SDK 217.0.0
>
> app-engine-go
>
> app-engine-python 1.9.75
>
> app-engine-python-extras 1.9.74
>
> bq 2.0.34
>
> cloud-datastore-emulator 2.0.2
>
> core 2018.09.17
>
> gsutil 4.34
>
> Updates are available for some Cloud SDK components.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/11c13a74-2620-492c-919e-9481cb814894%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H2JhM-LcAYVuuKEs6m9KVAfg%2B1ASgcUw16mPBWj8ESphw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] using 3rd party libraries on GAE standard with Python

2018-09-25 Thread 'Rahul Ravindran' via Google App Engine
Could you paste the entire command surface?

Additionally, which version of Google Cloud SDK are you using?

On Tue, Sep 25, 2018 at 2:25 PM Dewey Gaedcke  wrote:

> Thanks for the response and clarification!!
> I remember being told way back NOT to use venv with GAE & so all these
> posts where it is now being shown is very confusing.
>
> Using Py3.7, I'm now getting:
>
> ERROR: Python 3 and later is not compatible with the Google Cloud SDK.
> Please use Python version 2.7.x.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/d46e86e4-7494-46f0-97d4-d6c9e6bed2a7%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H3L%3DUyz0MxfjCVRuPfjJcBivm8oYg3xP71mFNAcV1gHYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Issue with spaCy library in the new GAE Standard Environment python37 runtime

2018-09-22 Thread 'Rahul Ravindran' via Google App Engine
That is great. Do you mind sharing a sample of your code, requirements.txt
and app.yaml in a sample which we could use to look at this usecase?

Additionally, by slow- do you mean every request or the first one? If you
could share your image or code, that would help us debut further


On Sat, Sep 22, 2018 at 2:58 PM BLONDEV INC  wrote:

> Hey, I added the model to requirements.txt and made some modifications to
> account for that. It is now working, if VERY slowly.
>
> On Saturday, September 22, 2018 at 2:45:47 PM UTC-4, Rahul Ravindran wrote:
>>
>> So, you cannot install anything outside of requirements.txt into our
>> environment. You could download everything into your source folder and that
>> may work, but if you are attempting to download models and write it some
>> where, that is not supported
>>
>> On Sat, Sep 22, 2018 at 11:43 AM BLONDEV INC  wrote:
>>
> Then I installed the model I wanted,
>>>
>>> (flask3) ➜  readable001 git:(master) python -m spacy download en
>>> ...You can now load the model via spacy.load('en')
>>>
>>> and now it works on THIS machine, too...
>>>
>>> [image: Snip20180922_106.png]
>>>
>>>
>>>
>>>
>>> Seems the models for spaCy are not installed in the GAE2 Standard
>>> Environment.
>>> We want to use GAE.
>>> Please HELP.
>>>
>>> On Saturday, September 22, 2018 at 2:24:43 PM UTC-4, BLONDEV INC wrote:
>>>>
>>>> Having an issue, however, on another machine using a different
>>>> environment.
>>>> This is a message I got while installing spaCy:
>>>>
>>>> (flask3) ➜  readable001 git:(master) python -m spacy validate
>>>>
>>>> Installed models (spaCy v2.0.12)
>>>> /Users/rose/WORK/VENVS/flask3/lib/python3.7/site-packages/spacy
>>>>
>>>>
>>>> No models found in your current environment.
>>>>
>>>>
>>>> And then, the error message when I try to run from THIS machine:
>>>>
>>>> OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut
>>>> link, a Python package or a valid path to a data directory.
>>>>
>>>>
>>>>
>>>> On Saturday, September 22, 2018 at 2:21:19 PM UTC-4, BLONDEV INC wrote:
>>>>>
>>>>> Yes. It works just fine...
>>>>>
>>>>>
>>>>>
>>>>> [image: PHOTO-2018-09-22-14-17-42.jpg]
>>>>>
>>>>>
>>>>> On Saturday, September 22, 2018 at 1:09:53 PM UTC-4, Rahul Ravindran
>>>>> wrote:
>>>>>>
>>>>>> We don't use Conda. THis seems like an issue with your application.
>>>>>> Can you run this locally successfully?
>>>>>>
>>>>>> On Sat, Sep 22, 2018 at 9:50 AM BLONDEV INC 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am getting this error message when I make a GET request to my
>>>>>>> app's URL.
>>>>>>>
>>>>>>> File "/srv/main.py", line 12, in break_sentences nlp =
>>>>>>> spacy.load('en') File 
>>>>>>> "/env/lib/python3.7/site-packages/spacy/__init__.py",
>>>>>>> line 15, in load return util.load_model(name, **overrides) File
>>>>>>> "/env/lib/python3.7/site-packages/spacy/util.py", line 119, in 
>>>>>>> load_model
>>>>>>> raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find
>>>>>>> model 'en'. It doesn't seem to be a shortcut link, a Python package or a
>>>>>>> valid path to a data directory."
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am using the new version of GAE Standard Environment with python37
>>>>>>> runtime.
>>>>>>>
>>>>>>> Someone with a similar issue in a different context was able to
>>>>>>> solve by updating their version of spaCy. It seems that if you install 
>>>>>>> with
>>>>>>> conda, this issue arises.
>>>>>>>
>>>>>>> Could it be possible that the Google folks used conda to install
>>>>>>> spaCy to the new Standard Environment and that is why this is happening
>

Re: [google-appengine] Issue with spaCy library in the new GAE Standard Environment python37 runtime

2018-09-22 Thread 'Rahul Ravindran' via Google App Engine
So, you cannot install anything outside of requirements.txt into our
environment. You could download everything into your source folder and that
may work, but if you are attempting to download models and write it some
where, that is not supported

On Sat, Sep 22, 2018 at 11:43 AM BLONDEV INC  wrote:

> Then I installed the model I wanted,
>
> (flask3) ➜  readable001 git:(master) python -m spacy download en
> ...You can now load the model via spacy.load('en')
>
> and now it works on THIS machine, too...
>
> [image: Snip20180922_106.png]
>
>
>
>
> Seems the models for spaCy are not installed in the GAE2 Standard
> Environment.
> We want to use GAE.
> Please HELP.
>
> On Saturday, September 22, 2018 at 2:24:43 PM UTC-4, BLONDEV INC wrote:
>>
>> Having an issue, however, on another machine using a different
>> environment.
>> This is a message I got while installing spaCy:
>>
>> (flask3) ➜  readable001 git:(master) python -m spacy validate
>>
>> Installed models (spaCy v2.0.12)
>> /Users/rose/WORK/VENVS/flask3/lib/python3.7/site-packages/spacy
>>
>>
>> No models found in your current environment.
>>
>>
>> And then, the error message when I try to run from THIS machine:
>>
>> OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut
>> link, a Python package or a valid path to a data directory.
>>
>>
>>
>> On Saturday, September 22, 2018 at 2:21:19 PM UTC-4, BLONDEV INC wrote:
>>>
>>> Yes. It works just fine...
>>>
>>>
>>>
>>> [image: PHOTO-2018-09-22-14-17-42.jpg]
>>>
>>>
>>> On Saturday, September 22, 2018 at 1:09:53 PM UTC-4, Rahul Ravindran
>>> wrote:
>>>>
>>>> We don't use Conda. THis seems like an issue with your application. Can
>>>> you run this locally successfully?
>>>>
>>>> On Sat, Sep 22, 2018 at 9:50 AM BLONDEV INC  wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am getting this error message when I make a GET request to my app's
>>>>> URL.
>>>>>
>>>>> File "/srv/main.py", line 12, in break_sentences nlp =
>>>>> spacy.load('en') File 
>>>>> "/env/lib/python3.7/site-packages/spacy/__init__.py",
>>>>> line 15, in load return util.load_model(name, **overrides) File
>>>>> "/env/lib/python3.7/site-packages/spacy/util.py", line 119, in load_model
>>>>> raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find
>>>>> model 'en'. It doesn't seem to be a shortcut link, a Python package or a
>>>>> valid path to a data directory."
>>>>>
>>>>>
>>>>>
>>>>> I am using the new version of GAE Standard Environment with python37
>>>>> runtime.
>>>>>
>>>>> Someone with a similar issue in a different context was able to solve
>>>>> by updating their version of spaCy. It seems that if you install with
>>>>> conda, this issue arises.
>>>>>
>>>>> Could it be possible that the Google folks used conda to install spaCy
>>>>> to the new Standard Environment and that is why this is happening here?
>>>>>
>>>>> In any case... Can someone please HELP ME fix this and get my app
>>>>> running properly?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Google App Engine" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to google-appengi...@googlegroups.com.
>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/google-appengine.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/google-appengine/9eaffe10-b3ca-446e-ab73-e81394dff2c7%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/google-appengine/9eaffe10-b3ca-446e-ab73-e81394dff2c7%40googlegroups.com?utm_medium=email_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
&g

Re: [google-appengine] Issue with spaCy library in the new GAE Standard Environment python37 runtime

2018-09-22 Thread 'Rahul Ravindran' via Google App Engine
We don't use Conda. THis seems like an issue with your application. Can you
run this locally successfully?

On Sat, Sep 22, 2018 at 9:50 AM BLONDEV INC  wrote:

>
> Hi,
>
> I am getting this error message when I make a GET request to my app's URL.
>
> File "/srv/main.py", line 12, in break_sentences nlp = spacy.load('en')
> File "/env/lib/python3.7/site-packages/spacy/__init__.py", line 15, in load
> return util.load_model(name, **overrides) File
> "/env/lib/python3.7/site-packages/spacy/util.py", line 119, in load_model
> raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find
> model 'en'. It doesn't seem to be a shortcut link, a Python package or a
> valid path to a data directory."
>
>
>
> I am using the new version of GAE Standard Environment with python37
> runtime.
>
> Someone with a similar issue in a different context was able to solve by
> updating their version of spaCy. It seems that if you install with conda,
> this issue arises.
>
> Could it be possible that the Google folks used conda to install spaCy to
> the new Standard Environment and that is why this is happening here?
>
> In any case... Can someone please HELP ME fix this and get my app running
> properly?
>
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/9eaffe10-b3ca-446e-ab73-e81394dff2c7%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H0iEej0aLO3ZkFL8mpr-wO8_yc6TsnaKm6Qe2tbJdcaZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[issue33467] Python 3.7: profile-opt build errors because a test seems to hang

2018-05-11 Thread Rahul Ravindran

New submission from Rahul Ravindran <rahu...@gmail.com>:

make run_profile_task

runs the tests and does not seem to have any mechanism to exclude tests that I 
could find based on looking at the Makefile.

Previously, on Python 3.6, this test test_poplib was 
failing(https://bugs.python.org/issue32753) and the profile_task would ignore 
failing tests. 

Now, with the Python 3.7 build, the test seems to hang and hence profile opt 
builds cannot be built.
Attached is the trace of the build based on Python-3.7.0b4

--
components: Build
files: stack_trace1.txt
messages: 316408
nosy: Rahul Ravindran
priority: normal
severity: normal
status: open
title: Python 3.7: profile-opt build errors because a test seems to hang
versions: Python 3.7
Added file: https://bugs.python.org/file47582/stack_trace1.txt

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33467>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[google-appengine] Re: [Google Cloud Insiders] A lot of instability in Google App Engine Standard today

2017-10-31 Thread 'Rahul Ravindran' via Google App Engine
Hello,
  What is your app-id where you are seeing this?
Thanks,
~Rahul.

On Tue, Oct 31, 2017 at 1:37 PM, PK  wrote:

> Many requests fail, usually Ajax calls but I just got one in the UI. I am
> in US Central/python runtime anybody else experiencing instability?
>
> Error: Server ErrorThe server encountered an error and could not complete
> your request.
>
> Please try again in 30 seconds.
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google Cloud Insiders" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-cloud-insiders+unsubscr...@googlegroups.com.
> To post to this group, send email to google-cloud-insiders@
> googlegroups.com.
> Visit this group at https://groups.google.com/group/google-cloud-insiders.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/google-cloud-insiders/1F1EA516-B410-4EE7-ABFA-
> 06E5E704F100%40gae123.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CALZe7H0XKd0FdtAPEVSEPf05VXb6qCANWm%3DP0T9XGhSg0WzrxQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Load Balancing sink group not working

2015-04-16 Thread Rahul Ravindran
Looking further into jstack, it looks like in the default configuration 
(without sink groups), there is a thread per sink, while there is only one 
thread when using the sink group which would explain the drop in throughput 
when using sink groups. Am I missing something?
~Rahul. 


 On Thursday, April 16, 2015 12:47 PM, Rahul Ravindran rahu...@yahoo.com 
wrote:
   

 Hi, Below is my flume config and I am attempting to get Load Balancing sink 
group to LB across multiple machines. I see only 2 threads created for the 
entire sink group when using load balancing sink group and see the below 
message in the logs(and I see no throughput on draining events from the 
channel). On the other hand, if I comment out the sink group definition from 
the flume config and thus use the DefaultSinkProcessor, I see a lot more 
threads and events draining a lot faster. I suspect this is a problem with my 
config, but I could not find anything obvious. Could anyone here help?
Flume log output:flume.log:16 Apr 2015 17:18:07,549 INFO [main] 
(org.apache.flume.node.Application.startAllComponents:138) - Starting new 
configuration:{ sourceRunners:{netcat=EventDrivenSourceRunner: { 
source:org.apache.flume.source.NetcatSource{name:netcat,state:IDLE} }, 
spool=EventDrivenSourceRunner: { 
source:org.apache.flume.source.SpoolDirectorySource{name:spool,state:IDLE} }} 
sinkRunners:{mainSinks=SinkRunner: { 
policy:org.apache.flume.sink.LoadBalancingSinkProcessor@15f66cff counterGroup:{ 
name:null counters:{} } }, replaySinks=SinkRunner: { 
policy:org.apache.flume.sink.LoadBalancingSinkProcessor@656de49c counterGroup:{ 
name:null counters:{} } }} 
channels:{mainChannel=org.apache.flume.channel.MemoryChannel{name: 
mainChannel}, replayChannel=org.apache.flume.channel.MemoryChannel{name: 
replayChannel}} }

Flume config:agent1.channels.mainChannel.type = 
MEMORYagent1.channels.mainChannel.capacity = 
15agent1.channels.mainChannel.transactionCapacity = 1
agent1.channels.replayChannel.type = 
MEMORYagent1.channels.replayChannel.capacity = 
5agent1.channels.replayChannel.transactionCapacity = 5000
# netcat sourceagent1.sources.netcat.channels = 
mainChannelagent1.sources.netcat.type= netcatagent1.sources.netcat.bind = 
127.0.0.1agent1.sources.netcat.port = 
4agent1.sources.netcat.ack-every-event = 
falseagent1.sources.netcat.max-line-length = 8192
# spool directory sourceagent1.sources.spool.channels = 
replayChannelagent1.sources.spool.type = 
spooldiragent1.sources.spool.bufferMaxLineLength = 
8192agent1.sources.spool.bufferMaxLines = 1000agent1.sources.spool.batchSize = 
1000agent1.sources.spool.spoolDir = 
/br/agent_aud/replayagent1.sources.spool.inputCharset = ISO-8859-1#Label the 
event as a replayed eventagent1.sources.spool.interceptors = 
staticInterceptoragent1.sources.spool.interceptors.staticInterceptor.type = 
staticagent1.sources.spool.interceptors.staticInterceptor.key = 
tagent1.sources.spool.interceptors.staticInterceptor.value = r

agent1.sinks.avroMainSink1.type = avroagent1.sinks.avroMainSink1.channel = 
mainChannelagent1.sinks.avroMainSink1.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroMainSink1.port = 
4545agent1.sinks.avroMainSink1.connect-timeout = 
3agent1.sinks.avroMainSink1.request-timeout = 
2agent1.sinks.avroMainSink1.batch-size = 200
agent1.sinks.avroReplaySink1.type = avroagent1.sinks.avroReplaySink1.channel = 
replayChannelagent1.sinks.avroReplaySink1.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroReplaySink1.port = 
4545agent1.sinks.avroReplaySink1.connect-timeout = 
30agent1.sinks.avroReplaySink1.batch-size = 2000
agent1.sinks.avroMainSink2.type = avroagent1.sinks.avroMainSink2.channel = 
mainChannelagent1.sinks.avroMainSink2.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroMainSink2.port = 
4546agent1.sinks.avroMainSink2.connect-timeout = 
3agent1.sinks.avroMainSink2.request-timeout = 
2agent1.sinks.avroMainSink2.batch-size = 200
agent1.sinks.avroReplaySink2.type = avroagent1.sinks.avroReplaySink2.channel = 
replayChannelagent1.sinks.avroReplaySink2.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroReplaySink2.port = 
4546agent1.sinks.avroReplaySink2.connect-timeout = 
30agent1.sinks.avroReplaySink2.batch-size = 2000
agent1.sinks.avroMainSink3.type = avroagent1.sinks.avroMainSink3.channel = 
mainChannelagent1.sinks.avroMainSink3.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroMainSink3.port = 
4547agent1.sinks.avroMainSink3.connect-timeout = 
3agent1.sinks.avroMainSink3.request-timeout = 
2agent1.sinks.avroMainSink3.batch-size = 200
agent1.sinks.avroReplaySink3.type = avroagent1.sinks.avroReplaySink3.channel = 
replayChannelagent1.sinks.avroReplaySink3.hostname = 
flumefs-v01-00a.bento.btrll.comagent1.sinks.avroReplaySink3.port = 
4547agent1.sinks.avroReplaySink3.connect-timeout = 
30agent1.sinks.avroReplaySink3.batch-size = 2000
agent1.sinks.avroMainSink4.type = avroagent1

Re: Determining regions with low HDFS locality index

2014-12-27 Thread Rahul Ravindran
Thanks for the response Lars.
My question is not related to cluster or master startup as much as in a running 
cluster. My scenario is more about - in a running cluster, if a machine goes 
down, regions get moved off the down machine to other machines. Here, locality 
is impacted. 
I wanted to find a mechanism for me to query and determine the regions which 
have poor locality from a client and possibly trigger a manual compaction of 
such regions from the client to improve locality. I found 
HDFSBlocksDistribution which gives an indication of the region servers with bad 
locality but not the regions contained in that region server which are 
responsible. Is there any way to do that?
Thanks,~Rahul.
|   |
|   |   |   |   |   |
| HDFSBlocksDistribution (HBase 0.94.16 API)Methods  Modifier and Type Method 
and Description void add(HDFSBlocksDistribution otherBlocksDistribution) This 
will add the distribution from input to this object void  |
|  |
| View on hbase.apache.org | Preview by Yahoo |
|  |
|   |

   

 On Saturday, December 27, 2014 2:13 AM, lars hofhansl la...@apache.org 
wrote:
   

 There should be logic that attempts to restore the regions on the region 
servers that had them last.
Note that the master can only assign regions to region server that have 
reported in. For that reason the master waits a bit (4.5s by default) for 
region servers to report in after a master start before it starts assigning 
regions.Maybe in your case that time is too short? You can also configure the 
master to wait for a certain number of region server to report in.
If after you checked that it is still not working, could you file a jira 
outlining the details and steps to reproduce?

In any case, if the master has to assign the regions to a subset of the region 
servers it has no choice but to break locality. Then when the remaining region 
servers sign in in 0.94 there is no logic to maintain locality when the cluster 
is balanced. In 0.98 the stochastic balancer uses locality as one of its 
parameters - although I have personally seen issues with that that I still need 
to investigate.
-- Lars

      From: Rahul Ravindran rahu...@yahoo.com.INVALID
 To: user@hbase.apache.org user@hbase.apache.org 
 Sent: Thursday, December 25, 2014 11:37 PM
 Subject: Determining regions with low HDFS locality index
  
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the 
regions get moved off that machine to other Region Servers. However, since the 
new region servers do not have the backing HFiles, data locality for the newly 
transitioned regions is not great and hence some of our jobs are a lot slower 
on these regions. Is there an API for me to determine the regions within a RS 
which are responsible for low HDFS locality, for which I could trigger a 
compaction to improve locality?
I took a look at HDFSBlocksDistribution from which I can determine the RS with 
low HDFS locality. But, going from the RS level to the specific region which is 
responsible, seems harder. I could try to look at the backing hfiles and 
determine locality using HDFS, but that seems roundabout. Any suggestions?
I am running Hbase 0.94.15 with CDH 4.6
~Rahul. 



   

Determining regions with low HDFS locality index

2014-12-25 Thread Rahul Ravindran
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the 
regions get moved off that machine to other Region Servers. However, since the 
new region servers do not have the backing HFiles, data locality for the newly 
transitioned regions is not great and hence some of our jobs are a lot slower 
on these regions. Is there an API for me to determine the regions within a RS 
which are responsible for low HDFS locality, for which I could trigger a 
compaction to improve locality?
I took a look at HDFSBlocksDistribution from which I can determine the RS with 
low HDFS locality. But, going from the RS level to the specific region which is 
responsible, seems harder. I could try to look at the backing hfiles and 
determine locality using HDFS, but that seems roundabout. Any suggestions?
I am running Hbase 0.94.15 with CDH 4.6
~Rahul. 

[jira] [Created] (FLUME-2394) Command line argument to disable monitoring for config changes

2014-06-03 Thread Rahul Ravindran (JIRA)
Rahul Ravindran created FLUME-2394:
--

 Summary: Command line argument to disable monitoring for config 
changes
 Key: FLUME-2394
 URL: https://issues.apache.org/jira/browse/FLUME-2394
 Project: Flume
  Issue Type: New Feature
  Components: Configuration
Affects Versions: v1.5.0
Reporter: Rahul Ravindran
 Fix For: v1.6.0


Flume monitors for changes to the config file and attempts to re-initialize 
source/sinks/channels based on detected changes. However, this does not work 
for all config values, and also in undesirable in a lot of production 
environments where puppet/chef modifies the config, and likely restart flume. 
It would be good to have an optional command line argument which would disable 
this monitoring and require flume to be restarted for config changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (FLUME-2394) Command line argument to disable monitoring for config changes

2014-06-03 Thread Rahul Ravindran (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Ravindran updated FLUME-2394:
---

Description: Flume monitors for changes to the config file and attempts to 
re-initialize source/sinks/channels based on detected changes. However, this 
does not work for all config values, and also in undesirable in a lot of 
production environments where puppet/chef modifies the config, and likely 
restarts flume. It would be good to have an optional command line argument 
which would disable this monitoring and require flume to be restarted for 
config changes. We can control the restart using variety of orchestration 
mechanisms  (was: Flume monitors for changes to the config file and attempts to 
re-initialize source/sinks/channels based on detected changes. However, this 
does not work for all config values, and also in undesirable in a lot of 
production environments where puppet/chef modifies the config, and likely 
restart flume. It would be good to have an optional command line argument which 
would disable this monitoring and require flume to be restarted for config 
changes.)

 Command line argument to disable monitoring for config changes
 --

 Key: FLUME-2394
 URL: https://issues.apache.org/jira/browse/FLUME-2394
 Project: Flume
  Issue Type: New Feature
  Components: Configuration
Affects Versions: v1.5.0
Reporter: Rahul Ravindran
 Fix For: v1.6.0


 Flume monitors for changes to the config file and attempts to re-initialize 
 source/sinks/channels based on detected changes. However, this does not work 
 for all config values, and also in undesirable in a lot of production 
 environments where puppet/chef modifies the config, and likely restarts 
 flume. It would be good to have an optional command line argument which would 
 disable this monitoring and require flume to be restarted for config changes. 
 We can control the restart using variety of orchestration mechanisms



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (FLUME-2395) Flume does not shutdown cleanly on sending a term signal when it is receiving events

2014-06-03 Thread Rahul Ravindran (JIRA)
Rahul Ravindran created FLUME-2395:
--

 Summary: Flume does not shutdown cleanly on sending a term signal 
when it is receiving events
 Key: FLUME-2395
 URL: https://issues.apache.org/jira/browse/FLUME-2395
 Project: Flume
  Issue Type: Bug
Affects Versions: v1.5.0
Reporter: Rahul Ravindran


Running flume with avrosource, file channel, avro sink

Generated a reasonably high load where flume is receiving avro events via the 
avro source.
Now, send a kill flume pid. Flume does not die.

Doing the same, after using iptables to block the port on the running flume 
process ensures that the flume process does die gracefully on sending a TERM 
signal



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Error when reading parquet file

2014-05-01 Thread Rahul Ravindran
Hello,
  I created a parquet file our of MR and attempted to use Drill to query the 
file.

select * from /tmp/part-00.parquet
. . . . . . . . . . . . . . . . .  ;
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
running query.[error_id: a3ab05fe-0828-4276-9b40-646e1bd69342
endpoint {
  address: ringldr-v01-00a.bento8.btrll.com
  user_port: 31010
  bit_port: 32011
}
error_type: 0
message: Failure while running fragment.  NullPointerException
]
java.lang.RuntimeException: org.apache.drill.exec.rpc.RpcException: Remote 
failure while running query.[error_id: a3ab05fe-0828-4276-9b40-646e1bd69342
endpoint {
  address: ringldr-v01-00a.bento8.btrll.com
  user_port: 31010
  bit_port: 32011
}
error_type: 0
message: Failure while running fragment.  NullPointerException
]
at 
org.apache.drill.sql.client.full.ResultEnumerator.moveNext(ResultEnumerator.java:44)
at 
net.hydromatic.optiq.runtime.ObjectEnumeratorCursor.next(ObjectEnumeratorCursor.java:44)
at net.hydromatic.optiq.jdbc.OptiqResultSet.next(OptiqResultSet.java:162)
at sqlline.SqlLine$BufferedRows.init(SqlLine.java:2499)
at sqlline.SqlLine.print(SqlLine.java:1886)
at sqlline.SqlLine$Commands.execute(SqlLine.java:3835)
at sqlline.SqlLine$Commands.sql(SqlLine.java:3738)
at sqlline.SqlLine.dispatch(SqlLine.java:882)
at sqlline.SqlLine.begin(SqlLine.java:717)
at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:460)
at sqlline.SqlLine.main(SqlLine.java:443)
Caused by: org.apache.drill.exec.rpc.RpcException: Remote failure while running 
query.[error_id: a3ab05fe-0828-4276-9b40-646e1bd69342
endpoint {
  address: ringldr-v01-00a.bento8.btrll.com
  user_port: 31010
  bit_port: 32011
}
error_type: 0
message: Failure while running fragment.  NullPointerException
]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler.batchArrived(QueryResultHandler.java:72)
at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:79)
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:48)
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:33)
at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:142)
at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:127)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
at 
io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
at 
io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173)
at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:334)
at 
io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:320)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:497)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:465)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:359)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Thread.java:745)


Experience with HBASE-8283 and lots of small hfile

2014-04-03 Thread Rahul Ravindran
Hi, 
  We are currently on 0.94.2(CDH 4.2.1) and would likely upgrade to 0.94.15 
(CDH 4.6) primarily to use the above fix. We have turned off automatic major 
compactions. We load data into an hbase table every 2 minutes. Currently, we 
are not using bulk load since it created compaction issues. We noticed 
HBASE-8283 and could move to use this. Any gotchas on using this in production? 
Since, we could create a new HFile every 2 minutes, we would soon have a 
scenario where we would have a lot of hfiles. Would triggering a non 
major-compaction (using 
https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#compact(byte[]))
 periodically be a reasonable compromise along with enabling Hbase-8283


Thanks,
~Rahul.

Flume error in FIleChannel

2013-06-27 Thread Rahul Ravindran
Hi,
  We are using CDH flume 1.3 (which ships with 4.2.1). We see this error in our 
flume logs in our production system and restarting flume did not help. Looking 
at the flume code, it appears to be expecting the byte to be an OPERATION, but 
is not. Any ideas on what happened?

Thanks,
~Rahul.


27 Jun 2013 05:58:12,246 INFO  [Log-BackgroundWorker-ch3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-780.meta currentPosition = 68617970, logWriteOrderID = 
1400053754907
27 Jun 2013 05:58:12,248 INFO  [Log-BackgroundWorker-ch3] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume3/data/log-780 position: 68617970 logWriteOrderID: 1400053754907
27 Jun 2013 05:58:12,529 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:108)  
- Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 43941
27 Jun 2013 05:58:12,531 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:120)  - 
Updating checkpoint metadata: logWriteOrderID: 1400053760540, queueSize: 0, 
queueHead: 16264802
27 Jun 2013 05:58:12,583 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-786.meta currentPosition = 66046989, logWriteOrderID = 
1400053760540
27 Jun 2013 05:58:12,585 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume2/data/log-786 position: 66046989 logWriteOrderID: 1400053760540
27 Jun 2013 05:58:17,679 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:108)  
- Start checkpoint for /flume1/checkpoint/checkpoint, elements to sync = 225955
27 Jun 2013 05:58:17,682 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:120)  - 
Updating checkpoint metadata: logWriteOrderID: 1400053832535, queueSize: 
7255426, queueHead: 1778328
27 Jun 2013 05:58:17,736 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-781.meta currentPosition = 652840345, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:17,738 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume1/data/log-781 position: 652840345 logWriteOrderID: 
1400053832535
27 Jun 2013 05:58:17,739 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-779
27 Jun 2013 05:58:17,745 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-779.meta currentPosition = 1599537606, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:17,746 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-780
27 Jun 2013 05:58:17,752 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-780.meta currentPosition = 1610002802, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:25,538 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:457)  - process failed
java.lang.IllegalStateException: 1
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at 
org.apache.flume.channel.file.LogFile$RandomReader.get(LogFile.java:335)
        at org.apache.flume.channel.file.Log.get(Log.java:478)
        at 
org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:500)
        at 
org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
        at 
org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
27 Jun 2013 05:58:25,558 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. 
Exception follows.
org.apache.flume.EventDeliveryException: java.lang.IllegalStateException: 1
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalStateException: 1
        at 

Re: Flume error in FIleChannel

2013-06-27 Thread Rahul Ravindran
There was no disk space issue and we ran fsck on the disks and did not find any 
file corruption issue. Did not see anything weird at that point in time either



 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org user@flume.apache.org; Rahul Ravindran 
rahu...@yahoo.com 
Sent: Thursday, June 27, 2013 11:24 AM
Subject: Re: Flume error in FIleChannel
 


Looks like the file may have been corrupted. Can you verify if you are out of 
disk space or can see something that might have caused the data to be corrupted?

Hari



On Thu, Jun 27, 2013 at 6:41 AM, Rahul Ravindran rahu...@yahoo.com wrote:

Hi,
  We are using CDH flume 1.3 (which ships with 4.2.1). We see this error in 
our flume logs in our production system and restarting flume did not help. 
Looking at the flume code, it appears to be expecting the byte to be an 
OPERATION, but is not. Any ideas on what happened?


Thanks,
~Rahul.




27 Jun 2013 05:58:12,246 INFO  [Log-BackgroundWorker-ch3] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-780.meta currentPosition = 68617970, logWriteOrderID = 
1400053754907
27 Jun 2013 05:58:12,248 INFO  [Log-BackgroundWorker-ch3] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume3/data/log-780 position: 68617970 logWriteOrderID: 
1400053754907
27 Jun 2013 05:58:12,529 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:108) 
 - Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 43941
27 Jun 2013 05:58:12,531 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:120)  - 
Updating checkpoint metadata: logWriteOrderID: 1400053760540, queueSize: 0, 
queueHead: 16264802
27 Jun 2013 05:58:12,583 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-786.meta currentPosition = 66046989, logWriteOrderID = 
1400053760540
27 Jun 2013 05:58:12,585 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume2/data/log-786 position: 66046989 logWriteOrderID: 
1400053760540
27 Jun 2013 05:58:17,679 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.beginCheckpoint:108) 
 - Start checkpoint for /flume1/checkpoint/checkpoint, elements to sync = 
225955
27 Jun 2013 05:58:17,682 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:120)  - 
Updating checkpoint metadata: logWriteOrderID: 1400053832535, queueSize: 
7255426, queueHead: 1778328
27 Jun 2013 05:58:17,736 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-781.meta currentPosition = 652840345, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:17,738 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.Log.writeCheckpoint:898)  - Updated checkpoint 
for file: /flume1/data/log-781 position: 652840345 logWriteOrderID: 
1400053832535
27 Jun 2013 05:58:17,739 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-779
27 Jun 2013 05:58:17,745 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-779.meta currentPosition = 1599537606, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:17,746 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-780
27 Jun 2013 05:58:17,752 INFO  [Log-BackgroundWorker-ch1] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-780.meta currentPosition = 1610002802, logWriteOrderID = 
1400053832535
27 Jun 2013 05:58:25,538 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:457)  - process failed
java.lang.IllegalStateException: 1
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at 
org.apache.flume.channel.file.LogFile$RandomReader.get(LogFile.java:335)
        at org.apache.flume.channel.file.Log.get(Log.java:478)
        at 
org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:500)
        at 
org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
        at 
org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run

Downside of too many HFiles

2013-06-12 Thread Rahul Ravindran
Hello,
I am trying to understand the downsides of having a large number of hfiles by 
having a large hbase.hstore.compactionThreshold

  This delays major compaction. However, the amount of data that needs to be 
read and re-written as a single hfile during major compaction will remain the 
same unless we have large number of deletes or expired rows

I understand the random reads will be affected since each hfile may be a 
candidate for the row, but is there any other downside I am missing?


~Rahul.

Re: Scan + Gets are disk bound

2013-06-05 Thread Rahul Ravindran
Thanks for the approach you suggested Asaf. This is definitely very promising. 
Our use case is that, we have a raw stream of events which may have duplicates. 
After our HBase + MR processing, we would emit a de-duped stream (which would 
have duplicates eliminated) for later processing. Let me see if I understand 
your approach correctly:
* During major compaction, we emit only the earliest event. I 
understand this.
* Between major compactions, we would need only return the earliest 
event in the scan. However, we would no longer take advantage of the timerange 
scan since we would need to consider previously compacted files as well(an 
earlier duplicate could exist in a previously major-compacted hfile, hence we 
need to skip returning this row in the scan). This would mean the scan would 
need to be a full - table scan or we perform an exists() call in the prescan 
hook for an earlier version of the row? 
Thanks,
~Rahul.



 From: Asaf Mesika asaf.mes...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org; Rahul Ravindran 
rahu...@yahoo.com 
Sent: Tuesday, June 4, 2013 10:51 PM
Subject: Re: Scan + Gets are disk bound
 




On Tuesday, June 4, 2013, Rahul Ravindran  wrote:

Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan 
performance. I searched through the email archives and applied a bunch of the 
recommendations there, but they did not improve much. So, I am hoping I am 
missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream 
of data written into an HBase table and then we run a time-based MR on top of 
this Table). We currently were backed up and about 1.5 TB of data was loaded 
into the table and we began performing time-based scan MRs in 10 minute time 
intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute 
interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  
maxVersions = 5 for the table. We use TableInputFormat to perform the 
time-based scan to ensure data locality. In the mapper, we check if there 
exists a previous version of the row in a time period earlier to the timestamp 
of the input row. If not, we emit that row.
If I understand correctly, for a rowkey R, column family F, column qualifier C, 
if you have two values with time stamp 13:00 and 13:02, you want to remove the 
value associated with 13:02.

The best way to do this is  to write a simple RegionObserver Coprocessor, which 
hooks to the compaction process (preCompact for instance). In there simply, for 
any given R, F, C only emit the earliest timestamp value (the last, since 
timestamp is ordered descending), and that's it.
It's a very effective way, since you are riding on top of an existing process 
which reads the values either way, so you are not paying the price of reading 
it again your MR job. 
Also, in between major compactions, you can also implement the preScan hook in 
the region observer, so you'll pick up only the earliest timestamp value, thus 
achieving the same result for your client, although you haven't removed those 
values yet.

I've implemented this for counters delayed aggregations, and it works great in 
production.

 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned 
off block cache for this table with the expectation that the block index and 
bloom filter will be cached in the block cache. We expect duplicates to be 
rare and hence hope for most of these checks to be fulfilled by the bloom 
filter. Unfortunately, we notice very slow performance on account of being 
disk bound. Looking at jstack, we notice that most of the time, we appear to 
be hitting disk for the block index. We performed a major compaction and 
retried and performance improved some, but not by much. We are processing data 
at about 2 MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 
datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). HBase 
is running with 30 GB Heap size, memstore values being capped at 3 GB and 
flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap 
size(15 GB). We are using SNAPPY for our tables.


A couple of questions:
        * Is the performance of the time-based scan bad after a major 
compaction?

        * What can we do to help alleviate being disk bound? The typical 
answer of adding more RAM does not seem to have helped, or we are missing some 
other config



Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, 
numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, 
totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, 
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, 
readRequestsCount=30589690

Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan 
performance. I searched through the email archives and applied a bunch of the 
recommendations there, but they did not improve much. So, I am hoping I am 
missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream 
of data written into an HBase table and then we run a time-based MR on top of 
this Table). We currently were backed up and about 1.5 TB of data was loaded 
into the table and we began performing time-based scan MRs in 10 minute time 
intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute 
interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  
maxVersions = 5 for the table. We use TableInputFormat to perform the 
time-based scan to ensure data locality. In the mapper, we check if there 
exists a previous version of the row in a time period earlier to the timestamp 
of the input row. If not, we emit that row. 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned 
off block cache for this table with the expectation that the block index and 
bloom filter will be cached in the block cache. We expect duplicates to be rare 
and hence hope for most of these checks to be fulfilled by the bloom filter. 
Unfortunately, we notice very slow performance on account of being disk bound. 
Looking at jstack, we notice that most of the time, we appear to be hitting 
disk for the block index. We performed a major compaction and retried and 
performance improved some, but not by much. We are processing data at about 2 
MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 
datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). HBase 
is running with 30 GB Heap size, memstore values being capped at 3 GB and flush 
thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 GB). 
We are using SNAPPY for our tables.


A couple of questions:
* Is the performance of the time-based scan bad after a major 
compaction?

* What can we do to help alleviate being disk bound? The typical answer 
of adding more RAM does not seem to have helped, or we are missing some other 
config



Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, 
numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, 
totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, 
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, 
readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, 
flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, 
blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=2759, 
blockCacheMissCount=25373411, blockCacheEvictedCount=7112, 
blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, 
hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, 
fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, 
fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, 
fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, 
fsReadLatencyHistogram999th=511591146.03,
 fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=42, 
fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, 
fsPreadLatencyHistogram95th=11159637.65, 
fsPreadLatencyHistogram99th=37763281.57, 
fsPreadLatencyHistogram999th=273192813.91, 
fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=114, 
fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, 
fsWriteLatencyHistogram95th=576853.8, fsWriteLatencyHistogram99th=1034159.75, 
fsWriteLatencyHistogram999th=5687910.29



key size: 20 bytes 

Table description:
{NAME = 'foo', FAMILIES = [{NAME = 'f', DATA_BLOCK_ENCODING = 'NONE', 
BLOOMFI true
 LTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'SNAPPY', VERSIONS = 
'5', TTL = '
 2592000', MIN_VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = 
'65536', ENCODE_
 ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'false'}]}

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Our row-keys do not contain time. By time-based scans, I mean, an MR over the 
Hbase table where the scan object has no startRow or endRow but has a startTime 
and endTime.

Our row key format is MD5 of UUID+UUID, so, we expect good distribution. We 
have pre-split initially to prevent any initial hotspotting.
~Rahul.



 From: anil gupta anilgupt...@gmail.com
To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, June 4, 2013 9:31 PM
Subject: Re: Scan + Gets are disk bound
 







On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com wrote:

Hi,

We are relatively new to Hbase, and we are hitting a roadblock on our scan 
performance. I searched through the email archives and applied a bunch of the 
recommendations there, but they did not improve much. So, I am hoping I am 
missing something which you could guide me towards. Thanks in advance.

We are currently writing data and reading in an almost continuous mode (stream 
of data written into an HBase table and then we run a time-based MR on top of 
this Table). We currently were backed up and about 1.5 TB of data was loaded 
into the table and we began performing time-based scan MRs in 10 minute time 
intervals(startTime and endTime interval is 10 minutes). Most of the 10 minute 
interval had about 100 GB of data to process. 

Our workflow was to primarily eliminate duplicates from this table. We have  
maxVersions = 5 for the table. We use TableInputFormat to perform the 
time-based scan to ensure data locality. In the mapper, we check if there 
exists a previous version of the row in a time period earlier to the timestamp 
of the input row. If not, we emit that row. 

We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence turned 
off block cache for this table with the expectation that the block index and 
bloom filter will be cached in the block cache. We expect duplicates to be 
rare and hence hope for most of these checks to be fulfilled by the bloom 
filter. Unfortunately, we notice very slow performance on account of being 
disk bound. Looking at jstack, we notice that most of the time, we appear to 
be hitting disk for the block index. We performed a major compaction and 
retried and performance improved some, but not by much. We are processing data 
at about 2 MB per second.

  We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8 
datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM). 
Anil: You dont have the right balance between disk,cpu and ram. You have too 
much of CPU, RAM but very less NUMBER of disks. Usually, its better to have a 
Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems to be the 
biggest reason of your problem.

HBase is running with 30 GB Heap size, memstore values being capped at 3 GB and 
flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total heap size(15 
GB). We are using SNAPPY for our tables.


A couple of questions:
        * Is the performance of the time-based scan bad after a major 
compaction?

Anil: In general, TimeBased(i am assuming you have built your rowkey on 
timestamp) scans are not good for HBase because of region hot-spotting. Have 
you tried setting the ScannerCaching to a higher number?


        * What can we do to help alleviate being disk bound? The typical 
answer of adding more RAM does not seem to have helped, or we are missing some 
other config

Anil: Try adding more disks to your machines. 




Below are some of the metrics from a Regionserver webUI:

requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60, 
numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131, 
totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675, 
memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, 
readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0, 
flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672, blockCacheSizeMB=1604.86, 
blockCacheFreeMB=13731.24, blockCacheCount=11817, blockCacheHitCount=2759, 
blockCacheMissCount=25373411, blockCacheEvictedCount=7112, 
blockCacheHitRatio=52%, blockCacheHitCachingRatio=72%, 
hdfsBlocksLocalityIndex=91, slowHLogAppendCount=0, 
fsReadLatencyHistogramMean=15409428.56, fsReadLatencyHistogramCount=1559927, 
fsReadLatencyHistogramMedian=230609.5, fsReadLatencyHistogram75th=280094.75, 
fsReadLatencyHistogram95th=9574280.4, fsReadLatencyHistogram99th=100981301.2, 
fsReadLatencyHistogram999th=511591146.03,
 fsPreadLatencyHistogramMean=3895616.6, fsPreadLatencyHistogramCount=42, 
fsPreadLatencyHistogramMedian=954552, fsPreadLatencyHistogram75th=8723662.5, 
fsPreadLatencyHistogram95th=11159637.65, 
fsPreadLatencyHistogram99th=37763281.57, 
fsPreadLatencyHistogram999th=273192813.91, 
fsWriteLatencyHistogramMean=6124343.91, fsWriteLatencyHistogramCount=114, 
fsWriteLatencyHistogramMedian=374379, fsWriteLatencyHistogram75th=431395.75, 
fsWriteLatencyHistogram95th

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Thanks for that confirmation. This is what we hypothesized as well.

So, if we are dependent on timerange scans, we need to completely avoid major 
compaction and depend only on minor compactions? Is there any downside? We do 
have a TTL set on all the rows in the table.
~Rahul.



 From: Anoop John anoop.hb...@gmail.com
To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com 
Cc: anil gupta anilgupt...@gmail.com 
Sent: Tuesday, June 4, 2013 10:44 PM
Subject: Re: Scan + Gets are disk bound
 

When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.



-Anoop-

On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran rahu...@yahoo.com wrote:

 Our row-keys do not contain time. By time-based scans, I mean, an MR over
 the Hbase table where the scan object has no startRow or endRow but has a
 startTime and endTime.

 Our row key format is MD5 of UUID+UUID, so, we expect good distribution.
 We have pre-split initially to prevent any initial hotspotting.
 ~Rahul.


 
  From: anil gupta anilgupt...@gmail.com
 To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, June 4, 2013 9:31 PM
 Subject: Re: Scan + Gets are disk bound








 On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran rahu...@yahoo.com
 wrote:

 Hi,
 
 We are relatively new to Hbase, and we are hitting a roadblock on our
 scan performance. I searched through the email archives and applied a bunch
 of the recommendations there, but they did not improve much. So, I am
 hoping I am missing something which you could guide me towards. Thanks in
 advance.
 
 We are currently writing data and reading in an almost continuous mode
 (stream of data written into an HBase table and then we run a time-based MR
 on top of this Table). We currently were backed up and about 1.5 TB of data
 was loaded into the table and we began performing time-based scan MRs in 10
 minute time intervals(startTime and endTime interval is 10 minutes). Most
 of the 10 minute interval had about 100 GB of data to process.
 
 Our workflow was to primarily eliminate duplicates from this table. We
 have  maxVersions = 5 for the table. We use TableInputFormat to perform the
 time-based scan to ensure data locality. In the mapper, we check if there
 exists a previous version of the row in a time period earlier to the
 timestamp of the input row. If not, we emit that row.
 
 We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
 turned off block cache for this table with the expectation that the block
 index and bloom filter will be cached in the block cache. We expect
 duplicates to be rare and hence hope for most of these checks to be
 fulfilled by the bloom filter. Unfortunately, we notice very slow
 performance on account of being disk bound. Looking at jstack, we notice
 that most of the time, we appear to be hitting disk for the block index. We
 performed a major compaction and retried and performance improved some, but
 not by much. We are processing data at about 2 MB per second.
 
   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
 datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).
 Anil: You dont have the right balance between disk,cpu and ram. You have
 too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
 have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
 to be the biggest reason of your problem.

 HBase is running with 30 GB Heap size, memstore values being capped at 3
 GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
 heap size(15 GB). We are using SNAPPY for our tables.
 
 
 A couple of questions:
         * Is the performance of the time-based scan bad after a major
 compaction?
 
 Anil: In general, TimeBased(i am assuming you have built your rowkey on
 timestamp) scans are not good for HBase because of region hot-spotting.
 Have you tried setting the ScannerCaching to a higher number?


         * What can we do to help alleviate being disk bound? The typical
 answer of adding more RAM does not seem to have helped, or we are missing
 some other config
 
 Anil: Try adding more disks to your machines.


 
 
 Below are some of the metrics from a Regionserver webUI:
 
 requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
 numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
 totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
 memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
 readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
 flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
 blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
 blockCacheHitCount=2759, blockCacheMissCount=25373411,
 blockCacheEvictedCount

Flume 1.4 release

2013-05-21 Thread Rahul Ravindran
Hi,
  Is there a rough estimate on when 1.4 may be shipped? We were primarily 
looking for https://issues.apache.org/jira/browse/FLUME-997 and perhaps, 
looking to port that to 1.3.1 or use 1.4 if it is looking to ship sometime 
soon(by end of June)
~Rahul.

Re: IOException with HDFS-Sink:flushOrSync

2013-05-13 Thread Rahul Ravindran
Pinging again since this has been happening a lot more frequently recently



 From: Rahul Ravindran rahu...@yahoo.com
To: User-flume user@flume.apache.org 
Sent: Tuesday, May 7, 2013 8:42 AM
Subject: IOException with HDFS-Sink:flushOrSync
 


Hi,
   We have noticed this a few times now where we appear to have an IOException 
from HDFS and this stops draining the channel until the flume process is 
restarted. Below are the logs: namenode-v01-00b is the active namenode 
(namenode-v01-00a is standby). We are using Quorum Journal Manager for our 
Namenode HA, but there was no Namenode failover which was initiated. If this is 
an expected error, should flume handle it and gracefully retry (thereby not 
requiring a restart)?
Thanks,
~Rahul.

7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException 
writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020; ). Closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp)
 and rethrowing exception.
07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException 
while closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp).
 Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020;
  at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at 
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020;
  at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at 
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java

Re: IOException with HDFS-Sink:flushOrSync

2013-05-13 Thread Rahul Ravindran
Thanks Hari for your help in this. Appreciate it.

We will work towards upgrading to CDH 4.2.1 soon, and hopefully, this issue is 
resolved.

~Rahul.



 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Monday, May 13, 2013 7:58 PM
Subject: Re: IOException with HDFS-Sink:flushOrSync
 


The patch also made it to Hadoop 2.0.3.

On Monday, May 13, 2013, Hari Shreedharan  wrote:

Looks like CDH4.2.1 does have that patch: 
http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but 
it was not in CDH4.1.2)




Hari


-- 
Hari Shreedharan


On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:
We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses 
the same Hadoop version. Any suggestions on any mitigations?

Sent from my phone.Excuse the terseness.

On May 13, 2013, at 7:12 PM, Hari Shreedharan hshreedha...@cloudera.com 
wrote:


What version of Hadoop are you using? Looks like you are getting hit by 
https://issues.apache.org/jira/browse/HADOOP-6762. 




Hari


-- 
Hari Shreedharan


On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
So we've just had this happen twice to two different flume machines... we're 
using the HDFS sink as well, but ours is writing to an S3N:// URL. Both 
times our sink stopped working and the filechannel clogged up immediately 
causing serious problems. A restart of Flume worked -- but the filechannel 
was so backed up at that point that it took a good long while to get Flume 
started up again properly.


Anyone else seeing this behavior?


(oh, and we're running flume 1.3.0)

On May 7, 2013, at 8:42 AM, Rahul Ravindran rahu...@yahoo.com wrote:

Hi,
   We have noticed this a few times now where we appear to have an 
IOException from HDFS and this stops draining the channel until the flume 
process is restarted. Below are the logs: namenode-v01-00b is the active 
namenode (namenode-v01-00a is standby). We are using Quorum Journal 
Manager for our Namenode HA, but there was no Namenode failover which was 
initiated. If this is an expected error, should flume handle it and 
gracefully retry (thereby not requiring a restart)?
Thanks,
~Rahul.


7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException 
writing to HDFSWriter (IOException flush:java.io.IOException: Failed on 
local exception: java.nio.channels.ClosedByInterruptException; Host 
Details : local host is: flumefs-v01-10a.a.com/10.40.85.170; destination 
host is: namenode-v01-00a.a.com:8020; ). Closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp)
 and rethrowing exception.
07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException 
while closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp).
 Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on 
local exception: java.nio.channels.ClosedByInterruptException; Host 
Details : local host is: flumefs-v01-10a.a.com/10.40.85.170; destination 
host is: namenode-v01-00a.a.com:8020;

Re: Usage of use-fast-replay for FileChannel

2013-05-07 Thread Rahul Ravindran
)  - 
Updating log-862.meta currentPosition = 0, logWriteOrderID = 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.Log.writeCheckpoint:886)  - Updated checkpoint 
for file: /flume1/data/log-862 position: 0 logWriteOrderID: 1385582495372
07 May 2013 04:40:37,240 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-855
07 May 2013 04:40:37,246 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-855.meta currentPosition = 1547225990, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,247 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-856
07 May 2013 04:40:37,253 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-856.meta currentPosition = 892596719, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,255 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-857
07 May 2013 04:40:37,260 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-857.meta currentPosition = 1559591451, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,262 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-858
07 May 2013 04:40:37,267 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-858.meta currentPosition = 1550429668, logWriteOrderID = 
1385582495372
07 May 2013 04:40:37,269 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFile$RandomReader.close:356)  - Closing 
RandomReader /flume1/data/log-859
07 May 2013 04:40:37,274 INFO  [lifecycleSupervisor-1-2] 
(org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint:85)  - 
Updating log-859.meta currentPosition = 1616923189, logWriteOrderID = 
1385582495372



 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org user@flume.apache.org; Rahul Ravindran 
rahu...@yahoo.com 
Sent: Monday, May 6, 2013 9:57 PM
Subject: Re: Usage of use-fast-replay for FileChannel
 


Did you have an issue with the checkpoint that the entire 6G of data was 
replayed (look for BadCheckpointException in the logs to figure out if the 
channel was stopped in middle of a checkpoint)?

With the next version of Flume, you should be able to recover even if the 
channel stopped while the checkpoint was being written.

Fast Replay will try to maintain order, but it will require a massive amount of 
memory to run if you have a large number of events. Also, fast replay will only 
run if the checkpoint is corrupt/does not exist.

Hari




On Mon, May 6, 2013 at 9:40 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Hi,
   For FileChannel, how much of a performance improvement in replay times were 
observed with use-fast-replay? We currently have use-fast-replay set to false 
and were replaying about 6 G of data. We noticed replay times of about one 
hour. I looked at the code and it appears that fast-replay does not guarantee 
the same ordering of events during replay. Is this accurate? Are there any 
other downsides of using fast-replay? Any stability concerns?
Thanks,
~Rahul.

IOException with HDFS-Sink:flushOrSync

2013-05-07 Thread Rahul Ravindran
Hi,
   We have noticed this a few times now where we appear to have an IOException 
from HDFS and this stops draining the channel until the flume process is 
restarted. Below are the logs: namenode-v01-00b is the active namenode 
(namenode-v01-00a is standby). We are using Quorum Journal Manager for our 
Namenode HA, but there was no Namenode failover which was initiated. If this is 
an expected error, should flume handle it and gracefully retry (thereby not 
requiring a restart)?
Thanks,
~Rahul.

7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException 
writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020; ). Closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp)
 and rethrowing exception.
07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] 
(org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException 
while closing file 
(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp).
 Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020;
  at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at 
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local 
exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
host is: flumefs-v01-10a.a.com/10.40.85.170; destination host is: 
namenode-v01-00a.a.com:8020;
  at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at 
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] 
(org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException 

Usage of use-fast-replay for FileChannel

2013-05-06 Thread Rahul Ravindran
Hi,
   For FileChannel, how much of a performance improvement in replay times were 
observed with use-fast-replay? We currently have use-fast-replay set to false 
and were replaying about 6 G of data. We noticed replay times of about one 
hour. I looked at the code and it appears that fast-replay does not guarantee 
the same ordering of events during replay. Is this accurate? Are there any 
other downsides of using fast-replay? Any stability concerns?
Thanks,
~Rahul.

Flume error with HDFSSink when namenode standby is active

2013-02-25 Thread Rahul Ravindran
Hi,
  Flume writes to HDFS(we use Cloudera 4.1.2 release and Flume 1.3.1) using the 
HDFS nameservice which points to 2 namenodes (one of which is active and the 
other is standby). When the HDFS service is restarted, the namenode which comes 
up first becomes active. If the active namenode was swapped as a result of the 
HDFS restart, then, we see the below error:

* Do we need to ensure that Flume is shutdown prior to an HDFS restart?

* The Hadoop documentation mentioned that using the nameservice as the 
HDFS file destination ensures that the Hadoop client would look at both the 
namenodes and determine the currently active namenode, then, perform 
writes/reads from the currently active namenode. Is this not true with the HDFS 
sink ?
* What is the general practice around what needs to be done with Flume 
when the HDFS service parameters are changed and then restarted?


25 Feb 2013 08:26:59,836 WARN  [hdfs-hdfs-sink1-call-runner-5] 
(org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException 
while closing file (hdfs://nameservice1/*/event.1361494307973.tmp). Exception 
follows.
java.net.ConnectException: Call From flume* to namenode-v01-00b.*:8020 failed 
on connection exception: java.net.ConnectException: Connection refused; For 
more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
        at org.apache.hadoop.ipc.Client.call(Client.java:1164)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy11.getAdditionalDatanode(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:312)
        at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy12.getAdditionalDatanode(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:846)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:958)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:424)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:523)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:476)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:570)
        at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1213)
        at org.apache.hadoop.ipc.Client.call(Client.java:1140)
        ... 13 more

Re: File Channel error stops flume

2013-02-25 Thread Rahul Ravindran
I have attached the zipped log file at
https://issues.apache.org/jira/browse/FLUME-1928




 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Monday, February 25, 2013 1:30 PM
Subject: Re: File Channel error stops flume
 

Can you send your full logs? I suspect the channel did a full replay because it 
was restarted during a restart. (If it did, the logs would show a 
BadCheckpointException). 


Hari


-- 
Hari Shreedharan

On Monday, February 25, 2013 at 1:20 PM, Rahul Ravindran wrote:
Thanks Hari. I had waited for 20 minutes and this did not move change. Now, 
after more than an hour, I see it working




 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Monday, February 25, 2013 12:46 PM
Subject: Re: File Channel error stops flume
 

Rahul, 


Those messages actually just suggest that your channel is replaying. The 
channel will complete the replay and the agent will start the sinks once the 
channel is ready. It might take a few minutes based on how many events you 
have in the channel.




Hari


-- 
Hari Shreedharan


On Monday, February 25, 2013 at 12:07 PM, Rahul Ravindran wrote:
Hi,
   I modified a parameter to the HDFS sink on a flume config (added an 
idleInterval) on 2 machines. Things worked fine on one, and not on the other. 
I tried restarting flume a couple of times and I continue seeing the same log 
statement (bolded below) with no writes to HDFS


25 Feb 2013 08:27:00,174 INFO  [Log-BackgroundWorker-ch2] 
(org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint:109)  - 
Start checkpoint for /flume2/checkpoint/checkpoint, elements to sync = 8506
:%
25 Feb 2013 19:55:51,577 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-17
25 Feb 2013 19:55:51,585 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume1/data/log-17
25 Feb 2013 19:55:51,588 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.tools.DirectMemoryUtils.getDefaultDirectMemorySize:113)  - 
Unable to get maxDirectMemory from VM: NoSuchMethodException: 
sun.misc.VM.maxDirectMemory(null)
25 Feb 2013 19:55:51,592 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.tools.DirectMemoryUtils.allocate:47)  - Direct Memory 
Allocation:  Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 
268435456, Remaining = 268435456
25 Feb 2013 19:55:51,634 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 1622812128
25 Feb 2013 19:55:51,634 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 1622720601
25 Feb 2013 19:55:51,654 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-18
25 Feb 2013 19:55:51,655 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 1622821593
25 Feb 2013 19:55:51,655 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-19
25 Feb 2013 19:55:51,656 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 1622678590
25 Feb 2013 19:55:51,656 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-20
25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 244707334
25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-21
25 Feb 2013 19:55:51,657 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.skipToLastCheckpointPosition:466)
  - fast-forward to checkpoint position: 530601497
25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile$SequentialReader.next:491)  - 
Encountered EOF at 530601497 in /flume2/data/log-21
25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume2/data/log-22
25 Feb 2013 19:55:51,658 INFO  [lifecycleSupervisor-1-1] 
(org.apache.flume.channel.file.ReplayHandler.replayLog:236)  - Replaying 
/flume1/data/log-18
25 Feb 2013 19:55:51,658 WARN  [lifecycleSupervisor-1-0] 
(org.apache.flume.channel.file.LogFile

[jira] [Created] (FLUME-1928) File Channel

2013-02-25 Thread Rahul Ravindran (JIRA)
Rahul Ravindran created FLUME-1928:
--

 Summary: File Channel
 Key: FLUME-1928
 URL: https://issues.apache.org/jira/browse/FLUME-1928
 Project: Flume
  Issue Type: Question
Affects Versions: v1.3.1
Reporter: Rahul Ravindran
Assignee: Hari Shreedharan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-1928) File Channel

2013-02-25 Thread Rahul Ravindran (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Ravindran updated FLUME-1928:
---

Attachment: fl.zip

 File Channel
 

 Key: FLUME-1928
 URL: https://issues.apache.org/jira/browse/FLUME-1928
 Project: Flume
  Issue Type: Question
Affects Versions: v1.3.1
Reporter: Rahul Ravindran
Assignee: Hari Shreedharan
 Attachments: fl.zip




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Using HBase for Deduping

2013-02-19 Thread Rahul Ravindran
I could surround with a Try..Catch, but that would each time I insert a UUID 
for the first time (99% of the time), I would do a checkAndPut(), catch the 
resultant exception and perform a Put; so, 2 operations each reduce invocation, 
which is what I was looking to avoid



 From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Friday, February 15, 2013 9:24 AM
Subject: Re: Using HBase for Deduping
 

Interesting. 

Surround with a Try Catch? 

But it sounds like you're on the right path. 

Happy Coding!


On Feb 15, 2013, at 11:12 AM, Rahul Ravindran rahu...@yahoo.com wrote:

I had tried checkAndPut yesterday with a null passed as the value and it had 
thrown an exception when the row did not exist. Perhaps, I was doing something 
wrong. Will try that again, since, yes, I would prefer a checkAndPut().



From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org 
Cc: Rahul Ravindran rahu...@yahoo.com 
Sent: Friday, February 15, 2013 4:36 AM
Subject: Re: Using HBase for Deduping


On Feb 15, 2013, at 3:07 AM, Asaf Mesika asaf.mes...@gmail.com wrote:


Michael, this means read for every write?

Yes and no. 

At the macro level, a read for every write would mean that your client would 
read a record from HBase, and then based on some logic it would either write a 
record, or not. 

So that you have a lot of overhead in the initial get() and then put(). 

At this macro level, with a Check and Put you have less overhead because of a 
single message to HBase.

Intermal to HBase, you would still have to check the value in the row, if it 
exists and then perform an insert or not. 

WIth respect to your billion events an hour... 

dividing by 3600 to get the number of events in a second. You would have less 
than 300,000 events a second. 

What exactly are you doing and how large are those events? 

Since you are processing these events in a batch job, timing doesn't appear to 
be that important and of course there is also async hbase which may improve 
some of the performance. 

YMMV but this is a good example of the checkAndPut()




On Friday, February 15, 2013, Michael Segel wrote:


What constitutes a duplicate?

An over simplification is to do a HTable.checkAndPut() where you do the
put if the column doesn't exist.
Then if the row is inserted (TRUE) return value, you push the event.

That will do what you want.

At least at first blush.



On Feb 14, 2013, at 3:24 PM, Viral Bajaria viral.baja...@gmail.com
wrote:


Given the size of the data ( 1B rows) and the frequency of job run (once
per hour), I don't think your most optimal solution is to lookup HBase
for

every single event. You will benefit more by loading the HBase table
directly in your MR job.

In 1B rows, what's the cardinality ? Is it 100M UUID's ? 99% unique
UUID's ?


Also once you have done the unique, are you going to use the data again
in

some other way i.e. online serving of traffic or some other analysis ? Or
this is just to compute some unique #'s ?

It will be more helpful if you describe your final use case of the
computed

data too. Given the amount of back and forth, we can take it off list too
and summarize the conversation for the list.

On Thu, Feb 14, 2013 at 1:07 PM, Rahul Ravindran rahu...@yahoo.com
wrote:



We can't rely on the the assumption event dupes will not dupe outside an
hour boundary. So, your take is that, doing a lookup per event within
the

MR job is going to be bad?



From: Viral Bajaria viral.baja...@gmail.com
To: Rahul Ravindran rahu...@yahoo.com
Cc: user@hbase.apache.org user@hbase.apache.org
Sent: Thursday, February 14, 2013 12:48 PM
Subject: Re: Using HBase for Deduping

You could do with a 2-pronged approach here i.e. some MR and some HBase
lookups. I don't think this is the best solution either given the # of
events you will get.

FWIW, the solution below again relies on the assumption that if a event
is

duped in the same hour it won't have a dupe outside of that hour
boundary.

If it can have then you are better of with running a MR job with the
current hour + another 3 hours of data or an MR job with the current
hour +

the HBase table as input to the job too (i.e. no HBase lookups, just
read

the HFile directly) ?

- Run a MR job which de-dupes events for the current hour i.e. only
runs on

1 hour worth of data.
- Mark records which you were not able to de-dupe in the current run
- For the records that you were not able to de-dupe, check against HBase
whether you saw that event in the past. If you did, you can drop the
current event or update the event to the new value (based on your
business

logic)
- Save all the de-duped events (via HBase bulk upload)

Sorry if I just rambled along, but without knowing the whole problem
it's

very tough to come up with a probable solution. So correct my
assumptions

and we could drill down more

Re: Using HBase for Deduping

2013-02-15 Thread Rahul Ravindran
I had tried checkAndPut yesterday with a null passed as the value and it had 
thrown an exception when the row did not exist. Perhaps, I was doing something 
wrong. Will try that again, since, yes, I would prefer a checkAndPut().



 From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org 
Cc: Rahul Ravindran rahu...@yahoo.com 
Sent: Friday, February 15, 2013 4:36 AM
Subject: Re: Using HBase for Deduping
 

On Feb 15, 2013, at 3:07 AM, Asaf Mesika asaf.mes...@gmail.com wrote:

 Michael, this means read for every write?
 
Yes and no. 

At the macro level, a read for every write would mean that your client would 
read a record from HBase, and then based on some logic it would either write a 
record, or not. 

So that you have a lot of overhead in the initial get() and then put(). 

At this macro level, with a Check and Put you have less overhead because of a 
single message to HBase.

Intermal to HBase, you would still have to check the value in the row, if it 
exists and then perform an insert or not. 

WIth respect to your billion events an hour... 

dividing by 3600 to get the number of events in a second. You would have less 
than 300,000 events a second. 

What exactly are you doing and how large are those events? 

Since you are processing these events in a batch job, timing doesn't appear to 
be that important and of course there is also async hbase which may improve 
some of the performance. 

YMMV but this is a good example of the checkAndPut()



 On Friday, February 15, 2013, Michael Segel wrote:
 
 What constitutes a duplicate?
 
 An over simplification is to do a HTable.checkAndPut() where you do the
 put if the column doesn't exist.
 Then if the row is inserted (TRUE) return value, you push the event.
 
 That will do what you want.
 
 At least at first blush.
 
 
 
 On Feb 14, 2013, at 3:24 PM, Viral Bajaria viral.baja...@gmail.com
 wrote:
 
 Given the size of the data ( 1B rows) and the frequency of job run (once
 per hour), I don't think your most optimal solution is to lookup HBase
 for
 every single event. You will benefit more by loading the HBase table
 directly in your MR job.
 
 In 1B rows, what's the cardinality ? Is it 100M UUID's ? 99% unique
 UUID's ?
 
 Also once you have done the unique, are you going to use the data again
 in
 some other way i.e. online serving of traffic or some other analysis ? Or
 this is just to compute some unique #'s ?
 
 It will be more helpful if you describe your final use case of the
 computed
 data too. Given the amount of back and forth, we can take it off list too
 and summarize the conversation for the list.
 
 On Thu, Feb 14, 2013 at 1:07 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
 
 We can't rely on the the assumption event dupes will not dupe outside an
 hour boundary. So, your take is that, doing a lookup per event within
 the
 MR job is going to be bad?
 
 
 
 From: Viral Bajaria viral.baja...@gmail.com
 To: Rahul Ravindran rahu...@yahoo.com
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Thursday, February 14, 2013 12:48 PM
 Subject: Re: Using HBase for Deduping
 
 You could do with a 2-pronged approach here i.e. some MR and some HBase
 lookups. I don't think this is the best solution either given the # of
 events you will get.
 
 FWIW, the solution below again relies on the assumption that if a event
 is
 duped in the same hour it won't have a dupe outside of that hour
 boundary.
 If it can have then you are better of with running a MR job with the
 current hour + another 3 hours of data or an MR job with the current
 hour +
 the HBase table as input to the job too (i.e. no HBase lookups, just
 read
 the HFile directly) ?
 
 - Run a MR job which de-dupes events for the current hour i.e. only
 runs on
 1 hour worth of data.
 - Mark records which you were not able to de-dupe in the current run
 - For the records that you were not able to de-dupe, check against HBase
 whether you saw that event in the past. If you did, you can drop the
 current event or update the event to the new value (based on your
 business
 logic)
 - Save all the de-duped events (via HBase bulk upload)
 
 Sorry if I just rambled along, but without knowing the whole problem
 it's
 very tough to come up with a probable solution. So correct my
 assumptions
 and we could drill down more.
 
 Thanks,
 Viral
 
 On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
 
 Most will be in the same hour. Some will be across 3-6 hours.
 
 Sent from my phone.Excuse the terseness.
 
 On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com
 wrote:
 
 Are all these dupe events expected to be within the same hour or they
 can happen over multiple hours ?
 
 Viral
 From: Rahul Ravindran
 Sent: 2/14/2013 11:41 AM
 To: user@hbase.apache.org
 Subject: Using HBase for Deduping
 Hi,
  We have events which are delivered into our HDFS cluster which may
 be duplicated. Each

Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Hi,
   We have events which are delivered into our HDFS cluster which may be 
duplicated. Each event has a UUID and we were hoping to leverage HBase to 
dedupe them. We run a MapReduce job which would perform a lookup for each UUID 
on HBase and then emit the event only if the UUID was absent and would also 
insert into the HBase table(This is simplistic, I am missing out details to 
make this more resilient to failures). My concern is that doing a Read+Write 
for every event in MR would be slow (We expect around 1 Billion events every 
hour). Does anyone use Hbase for a similar use case or is there a different 
approach to achieving the same end result. Any information, comments would be 
great.

Thanks,
~Rahul.

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Most will be in the same hour. Some will be across 3-6 hours. 

Sent from my phone.Excuse the terseness.

On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com wrote:

 Are all these dupe events expected to be within the same hour or they
 can happen over multiple hours ?
 
 Viral
 From: Rahul Ravindran
 Sent: 2/14/2013 11:41 AM
 To: user@hbase.apache.org
 Subject: Using HBase for Deduping
 Hi,
We have events which are delivered into our HDFS cluster which may
 be duplicated. Each event has a UUID and we were hoping to leverage
 HBase to dedupe them. We run a MapReduce job which would perform a
 lookup for each UUID on HBase and then emit the event only if the UUID
 was absent and would also insert into the HBase table(This is
 simplistic, I am missing out details to make this more resilient to
 failures). My concern is that doing a Read+Write for every event in MR
 would be slow (We expect around 1 Billion events every hour). Does
 anyone use Hbase for a similar use case or is there a different
 approach to achieving the same end result. Any information, comments
 would be great.
 
 Thanks,
 ~Rahul.


Using Hbase for Dedupping

2013-02-14 Thread Rahul Ravindran
Hi,
   We have events which are delivered into our HDFS cluster which may be 
duplicated. Each event has a UUID and we were hoping to leverage HBase to 
dedupe them. We run a MapReduce job which would perform a lookup for each UUID 
on HBase and then emit the event only if the UUID was absent and would also 
insert into the HBase table(This is simplistic, I am missing out details to 
make this more resilient to failures). My concern is that doing a Read+Write 
for every event in MR would be slow (We expect around 1 Billion events every 
hour). Does anyone use Hbase for a similar use case or is there a different 
approach to achieving the same end result. Any information, comments would be 
great.

Thanks,
~Rahul.

Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
We can't rely on the the assumption event dupes will not dupe outside an hour 
boundary. So, your take is that, doing a lookup per event within the MR job is 
going to be bad?



 From: Viral Bajaria viral.baja...@gmail.com
To: Rahul Ravindran rahu...@yahoo.com 
Cc: user@hbase.apache.org user@hbase.apache.org 
Sent: Thursday, February 14, 2013 12:48 PM
Subject: Re: Using HBase for Deduping
 
You could do with a 2-pronged approach here i.e. some MR and some HBase
lookups. I don't think this is the best solution either given the # of
events you will get.

FWIW, the solution below again relies on the assumption that if a event is
duped in the same hour it won't have a dupe outside of that hour boundary.
If it can have then you are better of with running a MR job with the
current hour + another 3 hours of data or an MR job with the current hour +
the HBase table as input to the job too (i.e. no HBase lookups, just read
the HFile directly) ?

- Run a MR job which de-dupes events for the current hour i.e. only runs on
1 hour worth of data.
- Mark records which you were not able to de-dupe in the current run
- For the records that you were not able to de-dupe, check against HBase
whether you saw that event in the past. If you did, you can drop the
current event or update the event to the new value (based on your business
logic)
- Save all the de-duped events (via HBase bulk upload)

Sorry if I just rambled along, but without knowing the whole problem it's
very tough to come up with a probable solution. So correct my assumptions
and we could drill down more.

Thanks,
Viral

On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:

 Most will be in the same hour. Some will be across 3-6 hours.

 Sent from my phone.Excuse the terseness.

 On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com
 wrote:

  Are all these dupe events expected to be within the same hour or they
  can happen over multiple hours ?
 
  Viral
  From: Rahul Ravindran
  Sent: 2/14/2013 11:41 AM
  To: user@hbase.apache.org
  Subject: Using HBase for Deduping
  Hi,
     We have events which are delivered into our HDFS cluster which may
  be duplicated. Each event has a UUID and we were hoping to leverage
  HBase to dedupe them. We run a MapReduce job which would perform a
  lookup for each UUID on HBase and then emit the event only if the UUID
  was absent and would also insert into the HBase table(This is
  simplistic, I am missing out details to make this more resilient to
  failures). My concern is that doing a Read+Write for every event in MR
  would be slow (We expect around 1 Billion events every hour). Does
  anyone use Hbase for a similar use case or is there a different
  approach to achieving the same end result. Any information, comments
  would be great.
 
  Thanks,
  ~Rahul.


Re: Using HBase for Deduping

2013-02-14 Thread Rahul Ravindran
Checkandput() does not work when the row does not exist, or am I missing 
something?

Sent from my phone.Excuse the terseness.

On Feb 14, 2013, at 5:33 PM, Michael Segel michael_se...@hotmail.com wrote:

 What constitutes a duplicate? 
 
 An over simplification is to do a HTable.checkAndPut() where you do the put 
 if the column doesn't exist. 
 Then if the row is inserted (TRUE) return value, you push the event. 
 
 That will do what you want.
 
 At least at first blush. 
 
 
 
 On Feb 14, 2013, at 3:24 PM, Viral Bajaria viral.baja...@gmail.com wrote:
 
 Given the size of the data ( 1B rows) and the frequency of job run (once
 per hour), I don't think your most optimal solution is to lookup HBase for
 every single event. You will benefit more by loading the HBase table
 directly in your MR job.
 
 In 1B rows, what's the cardinality ? Is it 100M UUID's ? 99% unique UUID's ?
 
 Also once you have done the unique, are you going to use the data again in
 some other way i.e. online serving of traffic or some other analysis ? Or
 this is just to compute some unique #'s ?
 
 It will be more helpful if you describe your final use case of the computed
 data too. Given the amount of back and forth, we can take it off list too
 and summarize the conversation for the list.
 
 On Thu, Feb 14, 2013 at 1:07 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 
 We can't rely on the the assumption event dupes will not dupe outside an
 hour boundary. So, your take is that, doing a lookup per event within the
 MR job is going to be bad?
 
 
 
 From: Viral Bajaria viral.baja...@gmail.com
 To: Rahul Ravindran rahu...@yahoo.com
 Cc: user@hbase.apache.org user@hbase.apache.org
 Sent: Thursday, February 14, 2013 12:48 PM
 Subject: Re: Using HBase for Deduping
 
 You could do with a 2-pronged approach here i.e. some MR and some HBase
 lookups. I don't think this is the best solution either given the # of
 events you will get.
 
 FWIW, the solution below again relies on the assumption that if a event is
 duped in the same hour it won't have a dupe outside of that hour boundary.
 If it can have then you are better of with running a MR job with the
 current hour + another 3 hours of data or an MR job with the current hour +
 the HBase table as input to the job too (i.e. no HBase lookups, just read
 the HFile directly) ?
 
 - Run a MR job which de-dupes events for the current hour i.e. only runs on
 1 hour worth of data.
 - Mark records which you were not able to de-dupe in the current run
 - For the records that you were not able to de-dupe, check against HBase
 whether you saw that event in the past. If you did, you can drop the
 current event or update the event to the new value (based on your business
 logic)
 - Save all the de-duped events (via HBase bulk upload)
 
 Sorry if I just rambled along, but without knowing the whole problem it's
 very tough to come up with a probable solution. So correct my assumptions
 and we could drill down more.
 
 Thanks,
 Viral
 
 On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
 
 Most will be in the same hour. Some will be across 3-6 hours.
 
 Sent from my phone.Excuse the terseness.
 
 On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com
 wrote:
 
 Are all these dupe events expected to be within the same hour or they
 can happen over multiple hours ?
 
 Viral
 From: Rahul Ravindran
 Sent: 2/14/2013 11:41 AM
 To: user@hbase.apache.org
 Subject: Using HBase for Deduping
 Hi,
  We have events which are delivered into our HDFS cluster which may
 be duplicated. Each event has a UUID and we were hoping to leverage
 HBase to dedupe them. We run a MapReduce job which would perform a
 lookup for each UUID on HBase and then emit the event only if the UUID
 was absent and would also insert into the HBase table(This is
 simplistic, I am missing out details to make this more resilient to
 failures). My concern is that doing a Read+Write for every event in MR
 would be slow (We expect around 1 Billion events every hour). Does
 anyone use Hbase for a similar use case or is there a different
 approach to achieving the same end result. Any information, comments
 would be great.
 
 Thanks,
 ~Rahul.
 
 Michael Segel  | (m) 312.755.9623
 
 Segel and Associates
 
 


Re: Security between Avro-source and Avro-sink

2013-02-04 Thread Rahul Ravindran
Re..sending. 



 From: Rahul Ravindran rahu...@yahoo.com
To: User-flume user@flume.apache.org 
Sent: Thursday, January 31, 2013 2:39 PM
Subject: Security between Avro-source and Avro-sink
 

Hi,
   Is there a way to have secure communications between 2 Flume machines(one 
which has an avro source which forwards data to an avro sink)?
Thanks,
~Rahul.

Re: FileChannel error on Flume 1.3.1

2013-02-04 Thread Rahul Ravindran
Hi Brock,
 I created a JIRA https://issues.apache.org/jira/browse/FLUME-1900 which has 
the log file attached.
~Rahul.



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Saturday, February 2, 2013 4:05 PM
Subject: Re: FileChannel error on Flume 1.3.1
 

Yes that err looks odd so I'd like have a look if possible.  

-- 
Brock Noland

Sent with Sparrow

On Saturday, February 2, 2013 at 4:50 PM, Rahul Ravindran wrote:
Hi,
  No, I did not delete the checkpoint previously. That is what I did to fix 
the problem. 


Searching through the archives, it looks like 
http://mail-archives.apache.org/mod_mbox/flume-user/201211.mbox/%3cda66e7657bfbd949829f3c98d73a54cf18e...@rhv-exrda-s11.corp.ebay.com%3E
 needs the checkpoint to be deleted. Will do that in future. Do you still need 
the logs?
~Rahul.




 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org user@flume.apache.org; Rahul Ravindran 
rahu...@yahoo.com 
Sent: Saturday, February 2, 2013 2:32 PM
Subject: Re: FileChannel error on Flume 1.3.1
 

Hi,


That isn't a good error message. Could you share your entire log file? Post it 
on a JIRA or pastebin. 


Did you delete the checkpoint before starting the channel after changing 
capacity? Capacity is fixed for the file channel as such the checkpoint must 
be deleted to change the capacity. 


Brock

On Saturday, February 2, 2013, Rahul Ravindran  wrote:

Hi,
  I increased the capacity and the max file size parameters of the 
filechannel in our config and began seeing the below exception. I continue 
seeing the below on restarting the channel. I fixed this by removing the 
files in the checkpoint and data folders.


02 Feb 2013 21:45:51,113 WARN  
[SinkRunner-PollingRunner-LoadBalancingSinkProcessor] 
(org.apache.flume.sink.LoadBalancingSinkProcessor.process:158)  - Sink failed 
to consume event. Attempting next sink if available.
java.lang.IllegalStateException: Channel closed [channel=ch2]. Due to 
java.io.EOFException: null
at 
org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at 
org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.AvroSink.process(AvroSink.java:277)
at 
org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:154)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:759)
at java.io.RandomAccessFile.readLong(RandomAccessFile.java:792)
at 
org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:71)
at 
org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:36)
at org.apache.flume.channel.file.Log.replay(Log.java:365)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/




[jira] [Created] (FLUME-1900) FileChannel Error

2013-02-04 Thread Rahul Ravindran (JIRA)
Rahul Ravindran created FLUME-1900:
--

 Summary: FileChannel Error
 Key: FLUME-1900
 URL: https://issues.apache.org/jira/browse/FLUME-1900
 Project: Flume
  Issue Type: Question
Reporter: Rahul Ravindran
Assignee: Brock Noland
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (FLUME-1900) FileChannel Error

2013-02-04 Thread Rahul Ravindran (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Ravindran updated FLUME-1900:
---

Attachment: flume.log

 FileChannel Error
 -

 Key: FLUME-1900
 URL: https://issues.apache.org/jira/browse/FLUME-1900
 Project: Flume
  Issue Type: Question
Reporter: Rahul Ravindran
Assignee: Brock Noland
Priority: Minor
 Attachments: flume.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Security between Avro-source and Avro-sink

2013-01-31 Thread Rahul Ravindran
Hi,
   Is there a way to have secure communications between 2 Flume machines(one 
which has an avro source which forwards data to an avro sink)?
Thanks,
~Rahul.

Cloudera Manager usage for Flume

2013-01-03 Thread Rahul Ravindran
Hi,
  Is there any additional management/monitoring abilities or anything else for 
flume which is available via Cloudera Manager?
Thanks,
~Rahul.

Flume 1.3 package

2012-12-19 Thread Rahul Ravindran
Hi,
  Is Flume 1.3 part of CDH4? Is Flume 1.3 part of any debian repo for 
installation? I have the link for http://flume.apache.org/download.html which 
gives me the tar file. However, this does not install Flume's dependencies. 
Thanks,
~Rahul.

[jira] [Commented] (FLUME-1713) Netcat source should allow for *not* returning OK upon receipt of each message

2012-11-29 Thread Rahul Ravindran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506319#comment-13506319
 ] 

Rahul Ravindran commented on FLUME-1713:


Is there any way to get a patch of this change for v1.3? We plan to deploy 
prior to 1.4, and I would really prefer not to build our own version of v1.3.

 Netcat source should allow for *not* returning OK upon receipt of each 
 message
 

 Key: FLUME-1713
 URL: https://issues.apache.org/jira/browse/FLUME-1713
 Project: Flume
  Issue Type: Improvement
  Components: Easy
Affects Versions: v1.2.0, v1.3.0
Reporter: Mike Percy
Assignee: Rahul Ravindran
Priority: Minor
  Labels: newbie
 Fix For: v1.4.0

 Attachments: 
 0001-FLUME-1713-Netcat-source-should-allow-for-not-return.patch, 
 final_patch.diff


 Right now, the Netcat source returns OK when each message is processed. In 
 reality, this means that using netcat to send to the Netcat source will in 
 most cases not work as expected. It will stall out if the responses are not 
 read back once the TCP receive buffers fill up.
 The default configuration setup should remain the same though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Request to add Flume-1713 into Flume v1.3 RC

2012-11-29 Thread Rahul Ravindran
Hello,
   I just joined the dev user mailing list and could not respond to the v1.3 
voting thread.

   We are looking to deploy Flume into our production environment prior to a 
1.4 release and are hoping to use the Netcat source. It would be great if you 
could get https://issues.apache.org/jira/browse/FLUME-1713 into the v1.3 RC. I 
would really prefer avoiding building our own version of v1.3 with this fix. 
This is a low risk fix which does not change any existing behavior. The new 
behavior is guarded with a new config flag.
Thanks,
~Rahul.

[jira] [Commented] (FLUME-1713) Netcat source should allow for *not* returning OK upon receipt of each message

2012-11-29 Thread Rahul Ravindran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506592#comment-13506592
 ] 

Rahul Ravindran commented on FLUME-1713:


[~mpercy], I just joined the dev list and don't seem to be able to respond to 
an older thread. Would be thankful if you could bring this up in the 1.3 VOTE 
thread. I have sent a separate mail to the dev mailing list with the same 
request.

 Netcat source should allow for *not* returning OK upon receipt of each 
 message
 

 Key: FLUME-1713
 URL: https://issues.apache.org/jira/browse/FLUME-1713
 Project: Flume
  Issue Type: Improvement
  Components: Easy
Affects Versions: v1.2.0, v1.3.0
Reporter: Mike Percy
Assignee: Rahul Ravindran
Priority: Minor
  Labels: newbie
 Fix For: v1.4.0

 Attachments: 
 0001-FLUME-1713-Netcat-source-should-allow-for-not-return.patch, 
 final_patch.diff


 Right now, the Netcat source returns OK when each message is processed. In 
 reality, this means that using netcat to send to the Netcat source will in 
 most cases not work as expected. It will stall out if the responses are not 
 read back once the TCP receive buffers fill up.
 The default configuration setup should remain the same though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Request to add Flume-1713 into Flume v1.3 RC

2012-11-29 Thread Rahul Ravindran
Thanks much!



 From: Brock Noland br...@cloudera.com
To: dev@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Thursday, November 29, 2012 9:11 AM
Subject: Re: Request to add Flume-1713 into Flume v1.3 RC
 
I have committed this to flume-1.3.0 branch since it's very low risk.

On Thu, Nov 29, 2012 at 9:42 AM, Rahul Ravindran rahu...@yahoo.com wrote:

 Hello,
    I just joined the dev user mailing list and could not respond to the
 v1.3 voting thread.

    We are looking to deploy Flume into our production environment prior to
 a 1.4 release and are hoping to use the Netcat source. It would be great if
 you could get https://issues.apache.org/jira/browse/FLUME-1713 into the
 v1.3 RC. I would really prefer avoiding building our own version of v1.3
 with this fix. This is a low risk fix which does not change any existing
 behavior. The new behavior is guarded with a new config flag.
 Thanks,
 ~Rahul.




-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Review Request: FLUME-1713 Netcat source to not return OK

2012-11-28 Thread Rahul Ravindran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8220/
---

(Updated Nov. 28, 2012, 9:02 p.m.)


Review request for Flume.


Changes
---

Final patch after incorporating Mike's comments


Description
---

FLUME-1713 Netcat source should allow for *not* returning OK upon receipt of 
each message; Added a config boolean parameter for this source 
ackEveryEvent(default value:false). Add parameterized test to existing Netcat 
unit test


This addresses bug FLUME-1713.
https://issues.apache.org/jira/browse/FLUME-1713


Diffs (updated)
-

  flume-ng-core/src/main/java/org/apache/flume/source/NetcatSource.java 37c09fe 
  
flume-ng-core/src/main/java/org/apache/flume/source/NetcatSourceConfigurationConstants.java
 1d8b5e4 
  flume-ng-doc/sphinx/FlumeUserGuide.rst b4a8868 
  flume-ng-node/src/test/java/org/apache/flume/source/TestNetcatSource.java 
3c17d3d 

Diff: https://reviews.apache.org/r/8220/diff/


Testing
---

Unit test added


Thanks,

Rahul Ravindran



[jira] [Updated] (FLUME-1713) Netcat source should allow for *not* returning OK upon receipt of each message

2012-11-28 Thread Rahul Ravindran (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLUME-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Ravindran updated FLUME-1713:
---

Attachment: final_patch.diff

Final patch after incorporating Mike's comments

 Netcat source should allow for *not* returning OK upon receipt of each 
 message
 

 Key: FLUME-1713
 URL: https://issues.apache.org/jira/browse/FLUME-1713
 Project: Flume
  Issue Type: Improvement
  Components: Easy
Affects Versions: v1.2.0
Reporter: Mike Percy
Priority: Minor
  Labels: newbie
 Attachments: 
 0001-FLUME-1713-Netcat-source-should-allow-for-not-return.patch, 
 final_patch.diff


 Right now, the Netcat source returns OK when each message is processed. In 
 reality, this means that using netcat to send to the Netcat source will in 
 most cases not work as expected. It will stall out if the responses are not 
 read back once the TCP receive buffers fill up.
 The default configuration setup should remain the same though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: FLUME-1713 Netcat source to not return OK

2012-11-27 Thread Rahul Ravindran


 On Nov. 28, 2012, 2:54 a.m., Mike Percy wrote:
  Looks good!
  
  Please make the following changes and then attach the patch to the JIRA:
  * Rename ackEveryEvent to ack-every-event to remain consistent with the 
  existing elements. Even though we have agreed to make all new properties 
  camel caps, in the case of existing components we should remain consistent 
  with the existing stuff
  * Please document this new parameter in the user guide @ 
  flume-ng-doc/sphinx/FlumeUserGuide.rst

Thanks Mike!

I am assuming the rename (of ackEveryEvent to act-every-event) is only for the 
constant which is used by users to set the flag in the config(not in the 
variable names in code). So, new parameter in the document in 
FlumeUserGuide.rst would be ack-every-event. Could you confirm?


- Rahul


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8220/#review13797
---


On Nov. 26, 2012, 2:24 a.m., Rahul Ravindran wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/8220/
 ---
 
 (Updated Nov. 26, 2012, 2:24 a.m.)
 
 
 Review request for Flume.
 
 
 Description
 ---
 
 FLUME-1713 Netcat source should allow for *not* returning OK upon receipt of 
 each message; Added a config boolean parameter for this source 
 ackEveryEvent(default value:false). Add parameterized test to existing Netcat 
 unit test
 
 
 This addresses bug FLUME-1713.
 https://issues.apache.org/jira/browse/FLUME-1713
 
 
 Diffs
 -
 
   flume-ng-core/src/main/java/org/apache/flume/source/NetcatSource.java 
 37c09fe 
   
 flume-ng-core/src/main/java/org/apache/flume/source/NetcatSourceConfigurationConstants.java
  1d8b5e4 
   flume-ng-node/src/test/java/org/apache/flume/source/TestNetcatSource.java 
 3c17d3d 
 
 Diff: https://reviews.apache.org/r/8220/diff/
 
 
 Testing
 ---
 
 Unit test added
 
 
 Thanks,
 
 Rahul Ravindran
 




Review Request: FLUME-1713 Netcat source to not return OK

2012-11-25 Thread Rahul Ravindran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8220/
---

Review request for Flume.


Description
---

FLUME-1713 Netcat source should allow for *not* returning OK upon receipt of 
each message; Added a config boolean parameter for this source 
ackEveryEvent(default value:false). Add parameterized test to existing Netcat 
unit test


This addresses bug FLUME-1713.
https://issues.apache.org/jira/browse/FLUME-1713


Diffs
-

  flume-ng-core/src/main/java/org/apache/flume/source/NetcatSource.java 37c09fe 
  
flume-ng-core/src/main/java/org/apache/flume/source/NetcatSourceConfigurationConstants.java
 1d8b5e4 
  flume-ng-node/src/test/java/org/apache/flume/source/TestNetcatSource.java 
3c17d3d 

Diff: https://reviews.apache.org/r/8220/diff/


Testing
---

Unit test added


Thanks,

Rahul Ravindran



Re: Review Request: FLUME-1713 Netcat source to not return OK

2012-11-25 Thread Rahul Ravindran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8220/
---

(Updated Nov. 26, 2012, 2:24 a.m.)


Review request for Flume.


Description
---

FLUME-1713 Netcat source should allow for *not* returning OK upon receipt of 
each message; Added a config boolean parameter for this source 
ackEveryEvent(default value:false). Add parameterized test to existing Netcat 
unit test


This addresses bug FLUME-1713.
https://issues.apache.org/jira/browse/FLUME-1713


Diffs
-

  flume-ng-core/src/main/java/org/apache/flume/source/NetcatSource.java 37c09fe 
  
flume-ng-core/src/main/java/org/apache/flume/source/NetcatSourceConfigurationConstants.java
 1d8b5e4 
  flume-ng-node/src/test/java/org/apache/flume/source/TestNetcatSource.java 
3c17d3d 

Diff: https://reviews.apache.org/r/8220/diff/


Testing
---

Unit test added


Thanks,

Rahul Ravindran



[jira] [Commented] (FLUME-1713) Netcat source should allow for *not* returning OK upon receipt of each message

2012-11-25 Thread Rahul Ravindran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503598#comment-13503598
 ] 

Rahul Ravindran commented on FLUME-1713:


Review board url: https://reviews.apache.org/r/8220/

 Netcat source should allow for *not* returning OK upon receipt of each 
 message
 

 Key: FLUME-1713
 URL: https://issues.apache.org/jira/browse/FLUME-1713
 Project: Flume
  Issue Type: Improvement
  Components: Easy
Affects Versions: v1.2.0
Reporter: Mike Percy
Priority: Minor
  Labels: newbie
 Attachments: 
 0001-FLUME-1713-Netcat-source-should-allow-for-not-return.patch


 Right now, the Netcat source returns OK when each message is processed. In 
 reality, this means that using netcat to send to the Netcat source will in 
 most cases not work as expected. It will stall out if the responses are not 
 read back once the TCP receive buffers fill up.
 The default configuration setup should remain the same though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Running multiple flume versions on the same box

2012-11-21 Thread Rahul Ravindran
Hi,
  This is primarily to try and address a flume upgrade scenario in the case of 
any incompatible changes in future. I tried this with multiple processes of the 
same version, and it appeared to work. Are there any concerns on running 
multiple versions of flume on the same box (each with different agent 
configurations where there is no overlap of ports) ? 

Thanks,
~Rahul.

Re: Running multiple flume versions on the same box

2012-11-21 Thread Rahul Ravindran
Absolutely. I don't expect to need to do this and I expect to be staging any 
changes prior to a production deployment. 

This was a question from our operations on how non-compatible upgrades would be 
handled, and I needed to get an idea on how we may handle it, if this scenario, 
however rare does come up.

Thanks for all the info.

~Rahul.




 From: Mike Percy mpe...@apache.org
To: user@flume.apache.org 
Cc: Rahul Ravindran rahu...@yahoo.com 
Sent: Wednesday, November 21, 2012 2:24 PM
Subject: Re: Running multiple flume versions on the same box
 

There are no system level singletons or hard-coded file paths or ports if that 
is what you mean.

But in a production scenario, Flume should be resilient to failures since it 
will just buffer events in the channel at each agent. So why run simultaneous 
versions when doing minor version upgrades? (I can understand in an OG - NG 
migration) If there is a problem just take it down and roll back; the rest of 
the system should be fine if you have done sufficient capacity planning (with 
channel sizes) and configuration to tolerate downtime - which I'd strongly 
recommend.

At the end of the day, it's always best to test new versions in staging any 
time you do a software upgrade, including with Flume.

Hope that helps.

Regards,
Mike



On Wed, Nov 21, 2012 at 1:29 PM, Camp, Roy rc...@ebay.com wrote:

We did this when upgrading from 0.9x to FlumeNG 1.3-SNAPSHOT.  Used different 
ports and different logging/data directories.  Worked great.
 
Roy
 
From:Rahul Ravindran [mailto:rahu...@yahoo.com] 
Sent: Wednesday, November 21, 2012 11:24 AM
To: User-flume
Subject: Running multiple flume versions on the same box
 
Hi,
  This is primarily to try and address a flume upgrade scenario in the case of 
any incompatible changes in future. I tried this with multiple processes of 
the same version, and it appeared to work. Are there any concerns on running 
multiple versions of flume on the same box (each with different agent 
configurations where there is no overlap of ports) ? 
 
Thanks,
~Rahul.

[jira] [Commented] (FLUME-1713) Netcat source should allow for *not* returning OK upon receipt of each message

2012-11-21 Thread Rahul Ravindran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502423#comment-13502423
 ] 

Rahul Ravindran commented on FLUME-1713:


Review board id:https://review.cloudera.org/r/2462/

 Netcat source should allow for *not* returning OK upon receipt of each 
 message
 

 Key: FLUME-1713
 URL: https://issues.apache.org/jira/browse/FLUME-1713
 Project: Flume
  Issue Type: Improvement
  Components: Easy
Affects Versions: v1.2.0
Reporter: Mike Percy
Priority: Minor
  Labels: newbie
 Attachments: 
 0001-FLUME-1713-Netcat-source-should-allow-for-not-return.patch


 Right now, the Netcat source returns OK when each message is processed. In 
 reality, this means that using netcat to send to the Netcat source will in 
 most cases not work as expected. It will stall out if the responses are not 
 read back once the TCP receive buffers fill up.
 The default configuration setup should remain the same though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
Thanks for the responses.

Good to know that the only external dependencies are Hadoop and Hbase. We will 
deploy those components only on boxes which are going to have those sinks set 
up.



 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org 
Sent: Monday, November 19, 2012 3:29 PM
Subject: Re: Hadoop jars
 

Flume installs all required binaries, except for Hadoop (and the dependencies 
it would pull in) and HBase. This is because Flume, like most other Hadoop 
ecosystem components is meant to work against binary incompatible versions of 
Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
expect Hadoop to be available on the machines you are running Flume on. Once 
you install Hadoop you should not have any dependency issues. Same is true for 
HBase. 


Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
Easiest way is to install cdh binary and point your flume's classpath to it.


On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com wrote:

Currently, unfortunately, i dont think there is any such documentation. 
A  very general answer would be..Normally this list would depend on the 
source/sink/channel you are using.
I think it would be nice if the user manual did list these external 
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on 
anything more than HDFS itself. 
-roshan 



On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Are there other such libraries which will need to be downloaded? Is there a 
well-defined location for the hadoop jar and any other jars that flume may 
depend on?



 

Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
That is unfortunate. Is it sufficient if I package just hadoop-common.jar or is 
the recommended way essentially doing an apt-get install flume-ng which will 
install the below

# apt-cache depends flume-ng

flume-ng
  Depends: adduser
  Depends: hadoop-hdfs
  Depends: bigtop-utils

My concern is that hadoop-hdfs brings in a ton of other stuff which will not be 
used in any box except the one running the hdfs sink.

Thanks,
~Rahul.


 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Monday, November 19, 2012 4:08 PM
Subject: Re: Hadoop jars
 

Unfortunately, the FileChannel too has a hadoop dependency - even though the 
classes are never used. So you need the hadoop jars (and they should be added 
to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) 
on machines which will use the FileChannel. The channel directly does not 
depend on Hadoop anymore, but still needs them in the class path because we 
support migration from the older format to new format. 


Thanks,
Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
Thanks for the responses.


Good to know that the only external dependencies are Hadoop and Hbase. We will 
deploy those components only on boxes which are going to have those sinks set 
up.




 From: Hari Shreedharan hshreedha...@cloudera.com
To: user@flume.apache.org 
Sent: Monday, November 19, 2012 3:29 PM
Subject: Re: Hadoop jars
 

Flume installs all required binaries, except for Hadoop (and the dependencies 
it would pull in) and HBase. This is because Flume, like most other Hadoop 
ecosystem components is meant to work against binary incompatible versions of 
Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we 
expect Hadoop to be available on the machines you are running Flume on. Once 
you install Hadoop you should not have any dependency issues. Same is true for 
HBase. 




Hari


-- 
Hari Shreedharan


On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
Easiest way is to install cdh binary and point your flume's classpath to it.


On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com wrote:

Currently, unfortunately, i dont think there is any such documentation. 
A  very general answer would be..Normally this list would depend on the 
source/sink/channel you are using.
I think it would be nice if the user manual did list these external 
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on 
anything more than HDFS itself. 
-roshan 



On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Are there other such libraries which will need to be downloaded? Is there a 
well-defined location for the hadoop jar and any other jars that flume may 
depend on?



 



 

Re: Hadoop jars

2012-11-19 Thread Rahul Ravindran
Thanks. We will use that. 

Sent from my phone.Excuse the terseness.

On Nov 19, 2012, at 4:53 PM, Hari Shreedharan hshreedha...@cloudera.com wrote:

 No, you don't need Hdfs. Hadoop common/ Hadoop core should be enough. But 
 make sure you add it to the classpath as I mentioned before.
 
 Hari
 
 On Nov 19, 2012, at 4:27 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 
 That is unfortunate. Is it sufficient if I package just hadoop-common.jar or 
 is the recommended way essentially doing an apt-get install flume-ng which 
 will install the below
 
 # apt-cache depends flume-ng
 
 flume-ng
   Depends: adduser
   Depends: hadoop-hdfs
   Depends: bigtop-utils
 
 My concern is that hadoop-hdfs brings in a ton of other stuff which will not 
 be used in any box except the one running the hdfs sink.
 
 Thanks,
 ~Rahul.
 From: Hari Shreedharan hshreedha...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
 Sent: Monday, November 19, 2012 4:08 PM
 Subject: Re: Hadoop jars
 
 Unfortunately, the FileChannel too has a hadoop dependency - even though the 
 classes are never used. So you need the hadoop jars (and they should be 
 added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should 
 be set) on machines which will use the FileChannel. The channel directly 
 does not depend on Hadoop anymore, but still needs them in the class path 
 because we support migration from the older format to new format.
 
 
 Thanks,
 Hari
 
 -- 
 Hari Shreedharan
 
 On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
 Thanks for the responses.
 
 Good to know that the only external dependencies are Hadoop and Hbase. We 
 will deploy those components only on boxes which are going to have those 
 sinks set up.
 
 From: Hari Shreedharan hshreedha...@cloudera.com
 To: user@flume.apache.org 
 Sent: Monday, November 19, 2012 3:29 PM
 Subject: Re: Hadoop jars
 
 Flume installs all required binaries, except for Hadoop (and the 
 dependencies it would pull in) and HBase. This is because Flume, like most 
 other Hadoop ecosystem components is meant to work against binary 
 incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging 
 hadoop jars with Flume, we expect Hadoop to be available on the machines 
 you are running Flume on. Once you install Hadoop you should not have any 
 dependency issues. Same is true for HBase.
 
 
 Hari
 
 -- 
 Hari Shreedharan
 
 On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
 Easiest way is to install cdh binary and point your flume's classpath to 
 it.
 
 On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik ros...@hortonworks.com 
 wrote:
 Currently, unfortunately, i dont think there is any such documentation.
 A  very general answer would be..Normally this list would depend on the 
 source/sink/channel you are using.
 I think it would be nice if the user manual did list these external 
 dependencies for each component.
 I am not the expert on HDFS sink.. but i dont see why it would depend on 
 anything more than HDFS itself. 
 -roshan
 
 
 On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran rahu...@yahoo.com 
 wrote:
 Are there other such libraries which will need to be downloaded? Is 
 there a well-defined location for the hadoop jar and any other jars that 
 flume may depend on?
 
 
 


Re: Flume hops behind HAProxy

2012-11-15 Thread Rahul Ravindran
HAProxy has a TCP mode where it round robins TCP connections. Does it need to 
understand something specific about the wire protocol used by Flume?



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Wednesday, November 14, 2012 6:20 PM
Subject: Re: Flume hops behind HAProxy
 
 It would be round robin but not sticky sessions( so each request could Goto 
 any random flume hop)

Does HAProxy understand the protocol?  To round robin requests like
this it needs to understand the communication protocol, which I
suppose would work if you were using the HTTPSource.

On Wed, Nov 14, 2012 at 4:46 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 HAProxy is widely deployed already in our deployment and Ops is familiar
 with dealing with it for hosts which go down etc.

 
 From: Camp, Roy rc...@ebay.com
 To: user@flume.apache.org user@flume.apache.org
 Sent: Wednesday, November 14, 2012 2:15 PM
 Subject: RE: Flume hops behind HAProxy

 Out of curiosity, what is the use case vs using the built in load balancing?



 -Original Message-
 From: Rahul Ravindran [mailto:rahu...@yahoo.com]
 Sent: Wednesday, November 14, 2012 1:49 PM
 To: user@flume.apache.org
 Cc: user@flume.apache.org
 Subject: Re: Flume hops behind HAProxy

 It would be round robin but not sticky sessions( so each request could Goto
 any random flume hop)

 Sent from my phone.Excuse the terseness.

 On Nov 14, 2012, at 1:33 PM, Brock Noland br...@cloudera.com wrote:

 I assume it would be connection based round robin?  Might work just
 fine, but probably best to the use built-in support.

 On Wed, Nov 14, 2012 at 2:46 PM, Rahul Ravindran rahu...@yahoo.com
 wrote:
 Resending given I sent it during off-hours.

 
 From: Rahul Ravindran rahu...@yahoo.com
 To: user@flume.apache.org user@flume.apache.org
 Sent: Tuesday, November 13, 2012 5:52 PM
 Subject: Flume hops behind HAProxy

 Hi,
  Before I try it, I wanted to check if there were any known issues
 with this. We will have multiple flume agents sending an Avro stream
 each to a smaller set of intermediate flume hops. Are there any
 issues/concerns around having the flume agents send their streams to
 an HAProxy which will round robin between the different flume hops.
 Any issue around the transaction mechanism with this setup?

 I know that there is a selector mechanism in Flume to do this, but
 our operations extensively use HAProxy, and are most familiar with it.

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Flume hops behind HAProxy

2012-11-14 Thread Rahul Ravindran
Resending given I sent it during off-hours.



 From: Rahul Ravindran rahu...@yahoo.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Tuesday, November 13, 2012 5:52 PM
Subject: Flume hops behind HAProxy
 

Hi,
   Before I try it, I wanted to check if there were any known issues with this. 
We will have multiple flume agents sending an Avro stream each to a smaller set 
of intermediate flume hops. Are there any issues/concerns around having the 
flume agents send their streams to an HAProxy which will round robin between 
the different flume hops. Any issue around the transaction mechanism with this 
setup?

I know that there is a selector mechanism in Flume to do this, but our 
operations extensively use HAProxy, and are most familiar with it.

Thanks,
~Rahul.

Re: high level plugin architecture

2012-11-13 Thread Rahul Ravindran
In the 1.3 snapshot documentation, I don't see anything about the spool 
directory source. Is that ready?

Sent from my phone.Excuse the terseness.

On Nov 13, 2012, at 9:43 AM, Hari Shreedharan hshreedha...@cloudera.com wrote:

 You can find the details of the components and how to wire them together 
 here: http://flume.apache.org/FlumeUserGuide.html
 
 
 Thanks,
 Hari
 
 -- 
 Hari Shreedharan
 
 On Tuesday, November 13, 2012 at 6:26 AM, S Ahmed wrote:
 
 Hello,
 
 So I downloaded the flume source, and I was hoping someone can go over the 
 high-level plugin architecture.
 
 So each major feature of flume like a sink, or a channel has an interface, 
 and then concrete implementations implement the interface.
 
 How exactly do you wireup the type of sink or channel you want to use, is it 
 using IoC or do you just put the package/class in the config file and then 
 it assumes your .jar is in the classpath?
 


Re: high level plugin architecture

2012-11-13 Thread Rahul Ravindran
The link below mentioned that it is the Flume 1.3 Snapshot guide and I expected 
documentation regarding Spool Directory here. I did not see it. Am I missing 
something?




 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org 
Sent: Tuesday, November 13, 2012 10:12 AM
Subject: Re: high level plugin architecture
 
Where are you seeing that? I see that documented in the 1.3.0 branch
under Spooling Directory Source


On Tue, Nov 13, 2012 at 11:57 AM, Rahul Ravindran rahu...@yahoo.com wrote:
 In the 1.3 snapshot documentation, I don't see anything about the spool
 directory source. Is that ready?

 Sent from my phone.Excuse the terseness.

 On Nov 13, 2012, at 9:43 AM, Hari Shreedharan hshreedha...@cloudera.com
 wrote:

 You can find the details of the components and how to wire them together
 here: http://flume.apache.org/FlumeUserGuide.html


 Thanks,
 Hari

 --
 Hari Shreedharan

 On Tuesday, November 13, 2012 at 6:26 AM, S Ahmed wrote:

 Hello,

 So I downloaded the flume source, and I was hoping someone can go over the
 high-level plugin architecture.

 So each major feature of flume like a sink, or a channel has an interface,
 and then concrete implementations implement the interface.

 How exactly do you wireup the type of sink or channel you want to use, is it
 using IoC or do you just put the package/class in the config file and then
 it assumes your .jar is in the classpath?





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Netcat source stops processing data

2012-11-08 Thread Rahul Ravindran
Thanks! Will try removing the ok. 

Sent from my phone.Excuse the terseness.

On Nov 8, 2012, at 3:15 PM, Hari Shreedharan hshreedha...@cloudera.com wrote:

 Rahul,
 
 Are you reading the responses sent by the net cat source? If you don't read 
 the OK sent by net cat source on your application side, your application's 
 buffer gets full causing the net cat source to queue up stuff and eventually 
 die. It is something we need to fix, but I don't know if anyone is using net 
 cat in production - you should probably test using Avro source or the new 
 HTTP source(for this you would need to build trunk/1.3 branch or wait for 1.3 
 release).
 
 
 Thanks
 Hari
 
 -- 
 Hari Shreedharan
 
 On Thursday, November 8, 2012 at 3:05 PM, Rahul Ravindran wrote:
 
 Hello,
   I wanted to perform a load test to get an idea of how we would look to 
 scale flume for our deployment. I have pasted the config file at the source 
 below. I have a netcat source which is listening on a port and have 2 
 channels, 2 avro sinks consuming the events from the netcat source.
 
 My load generator is a simple C program which is continually sending 20 
 characters in each message using a socket, and send(). I notice that , 
 initially, a lot of traffic makes it through and then the flume agent 
 appears to stop consuming data(after about 80k messages). This results in 
 the tcp receive and send buffer being full. I understand that the rate at 
 which I am generating traffic may overwhelm flume, but I would expect it to 
 gradually consume data. It does not consume any more messages. I looked 
 through the flume logs and did not see anything there (no stack trace). I 
 ran tcpdump and see that the receive window initially is non-zero but begins 
 to decrease and then goes down to zero, and very slowly opens up to a size 
 of 1 (once in 10 seconds)
 
 Could you help on what may be going on or if there is something wrong with 
 my config?
 
 agent1.channels.ch1.type = MEMORY
 agent1.channels.ch1.capacity = 5
 agent1.channels.ch1.transactionCapacity = 5000
 
 agent1.sources.netcat.channels = ch1
 agent1.sources.netcat.type= netcat
 agent1.sources.netcat.bind = 127.0.0.1
 agent1.sources.netcat.port = 4
 
 agent1.sinks.avroSink1.type = avro
 agent1.sinks.avroSink1.channel = ch1
 agent1.sinks.avroSink1.hostname = remote hostname
 agent1.sinks.avroSink1.port = 4545
 agent1.sinks.avroSink1.connect-timeout = 30
 
 
 agent1.sinks.avroSink2.type = avro
 agent1.sinks.avroSink2.channel = ch1
 agent1.sinks.avroSink2.hostname = remote hostname
 agent1.sinks.avroSink2.port = 4546
 agent1.sinks.avroSink2.connect-timeout = 30
 
 agent1.channels = ch1
 agent1.sources = netcat
 agent1.sinks = avroSink1 avroSink2 avroSink2
 


Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Rahul Ravindran
Ping on the below questions about new Spool Directory source:

If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network partition/slow 
network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)





 From: Rahul Ravindran rahu...@yahoo.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Tuesday, November 6, 2012 3:40 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 

This is awesome. 
This may be perfect for our use case :)

When is the 1.3 release expected?

Couple of questions for the choice of channel for the new source:

If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network partition/slow 
network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)

Thanks,
~Rahul.



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 3:05 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We will update the checkpoint each time (we may tune this to be
 periodic)
 but the contents of the memory channel will be in the legacy logs which are
 currently being generated.

 Additionally, the sink for the memory channel will be an Avro source in
 another machine.

 Does that clear things up?

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:44 PM

 Subject: Re: Guarantees of the memory channel for delivering to sink

 But in your architecture you are going to write the contents of the
 memory channel out? Or did I miss
 something?

 The checkpoint will be updated each time we perform a successive
 insertion into the memory channel.

 On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number
 of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which
 will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this
 results
 in a risk of data, the maximum size of which is the capacity of the
 memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel
 is full because of lagging
 sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce -
 http://incubator.apache.org/mrunit/





 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Re: Guarantees of the memory channel for delivering to sink

2012-11-07 Thread Rahul Ravindran
Hi,

Thanks for the response.

Does the memory channel provide transactional guarantees? In the event of a 
network packet loss, does it retry sending the packet? If we ensure that we do 
not exceed the capacity for the memory channel, does it continue retrying to 
send an event to the remote source on failure?

Thanks,
~Rahul.



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Wednesday, November 7, 2012 11:48 AM
Subject: Re: Guarantees of the memory channel for delivering to sink
 

Hi,

Yes if you use memory channel, you can lose data. To not lose data, file 
channel needs to write to disk...

Brock


On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:

Ping on the below questions about new Spool Directory source:


If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network 
partition/slow network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)






 From: Rahul Ravindran rahu...@yahoo.com
To: user@flume.apache.org user@flume.apache.org 
Sent: Tuesday, November 6, 2012 3:40 PM

Subject: Re: Guarantees of the memory channel for delivering to sink
 


This is awesome. 
This may be perfect for our use case :)


When is the 1.3 release expected?


Couple of questions for the choice of channel for the new source:


If we choose to use the memory channel with this source, to an Avro sink on a 
remote box, do we risk data loss in the eventuality of a network 
partition/slow network or if the flume-agent on the source box dies?
If we choose to use file channel with this source, we will result in double 
writes to disk, correct? (one for the legacy log files which will be ingested 
by the Spool Directory source, and the other for the WAL)


Thanks,
~Rahul.




 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 3:05 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
This use case sounds like a perfect use of the Spool DIrectory source
which will be in the upcoming 1.3 release.

Brock

On Tue, Nov 6, 2012 at 4:53 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We will update the checkpoint each time
 (we may tune this to be
 periodic)
 but the contents of the memory channel will be in the legacy logs which are
 currently being generated.

 Additionally, the sink for the memory channel will be an Avro source in
 another machine.

 Does that clear things up?

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:44 PM

 Subject: Re: Guarantees of the memory channel for delivering to sink

 But in your architecture you
 are going to write the contents of the
 memory channel out? Or did I miss
 something?

 The checkpoint will be updated each time we perform a successive
 insertion into the memory channel.

 On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number
 of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO
 performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which
 will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this
 results
 in a risk of data, the maximum size of which is the capacity of the
 memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch

Adding an interceptor

2012-11-07 Thread Rahul Ravindran
Apologies. I am new to Flume, and I am probably missing something fairly 
obvious. I am attempting to test using a timestamp interceptor and host 
interceptor but I see only a sequence of numbers in the remote end.

Below is the flume config:



agent1.channels.ch1.type = MEMORY
agent1.channels.ch1.capacity = 500

agent1.sources.seq_gen.channels = ch1
agent1.sources.seq_gen.type = SEQ

agent1.sources.seq_gen.interceptors = inter1 host1
#agent1.sources.seq_gen.interceptors.inter1.type = 
org.apache.flume.interceptor.TimestampInterceptor$Builder
agent1.sources.seq_gen.interceptors.inter1.type = TIMESTAMP
agent1.sources.seq_gen.interceptors.inter1.preserveExisting = false

#agent1.sources.seq_gen.interceptors.host1.type = 
org.apache.flume.interceptor.HostInterceptor$Builder
agent1.sources.seq_gen.interceptors.host1.type = HOST
agent1.sources.seq_gen.interceptors.host1.preserveExisting = false
agent1.sources.seq_gen.interceptors.host1.hostHeader = hostname
agent1.sources.seq_gen.interceptors.host1.useIP = false


agent1.sinks.avroSink1.type = avro
agent1.sinks.avroSink1.channel = ch1
agent1.sinks.avroSink1.hostname = remote server
agent1.sinks.avroSink1.port = 4545
agent1.sinks.avroSink1.connect-timeout = 30

agent1.channels = ch1
agent1.sources = seq_gen
agent1.sinks = avroSink1

Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
Hi,
   I am very new to Flume and we are hoping to use it for our log aggregation 
into HDFS. I have a few questions below:

FileChannel will double our disk IO, which will affect IO performance on 
certain performance sensitive machines. Hence, I was hoping to write a custom 
Flume source which will use a memory channel, and which will perform 
checkpointing. The checkpoint will be updated each time we perform a successive 
insertion into the memory channel. (I realize that this results in a risk of 
data, the maximum size of which is the capacity of the memory channel).

   As long as there is capacity in the memory channel buffers, does the memory 
channel guarantee delivery to a sink (does it wait for acknowledgements, and 
retry failed packets)? This would mean that we need to ensure that we do not 
exceed the channel capacity.

I am writing a custom source which will use the memory channel, and which will 
catch a ChannelException to identify any channel capacity issues(so, buffer 
used in the memory channel is full because of lagging sinks/network issues 
etc). Is that a reasonable assumption to make?

Thanks,
~Rahul.

Re: Guarantees of the memory channel for delivering to sink

2012-11-06 Thread Rahul Ravindran
We will update the checkpoint each time (we may tune this to be periodic) but 
the contents of the memory channel will be in the legacy logs which are 
currently being generated.


Additionally, the sink for the memory channel will be an Avro source in another 
machine.

Does that clear things up?



 From: Brock Noland br...@cloudera.com
To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com 
Sent: Tuesday, November 6, 2012 1:44 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
But in your architecture you are going to write the contents of the
memory channel out? Or did I miss something?

The checkpoint will be updated each time we perform a successive
insertion into the memory channel.

On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 We have a legacy system which writes events to a file (existing log file).
 This will continue. If I used a filechannel, I will be double the number of
 IO operations(writes to the legacy log file, and writes to WAL).

 
 From: Brock Noland br...@cloudera.com
 To: user@flume.apache.org; Rahul Ravindran rahu...@yahoo.com
 Sent: Tuesday, November 6, 2012 1:38 PM
 Subject: Re: Guarantees of the memory channel for delivering to sink

 Your still going to be writing out all events, no? So how would file
 channel do more IO than that?

 On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran rahu...@yahoo.com wrote:
 Hi,
    I am very new to Flume and we are hoping to use it for our log
 aggregation into HDFS. I have a few questions below:

 FileChannel will double our disk IO, which will affect IO performance on
 certain performance sensitive machines. Hence, I was hoping to write a
 custom Flume source which will use a memory channel, and which will
 perform
 checkpointing. The checkpoint will be updated each time we perform a
 successive insertion into the memory channel. (I realize that this results
 in a risk of data, the maximum size of which is the capacity of the memory
 channel).

    As long as there is capacity in the memory channel buffers, does the
 memory channel guarantee delivery to a sink (does it wait for
 acknowledgements, and retry failed packets)? This would mean that we need
 to
 ensure that we do not exceed the channel capacity.

 I am writing a custom source which will use the memory channel, and which
 will catch a ChannelException to identify any channel capacity issues(so,
 buffer used in the memory channel is full because of lagging sinks/network
 issues etc). Is that a reasonable assumption to make?

 Thanks,
 ~Rahul.



 --
 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/





-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

[jira] [Commented] (FLUME-1227) Introduce some sort of SpillableChannel

2012-11-05 Thread Rahul Ravindran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491154#comment-13491154
 ] 

Rahul Ravindran commented on FLUME-1227:


Is there a timeline on when this new channel would be out?

 Introduce some sort of SpillableChannel
 ---

 Key: FLUME-1227
 URL: https://issues.apache.org/jira/browse/FLUME-1227
 Project: Flume
  Issue Type: New Feature
  Components: Channel
Reporter: Jarek Jarcec Cecho
Assignee: Patrick Wendell

 I would like to introduce new channel that would behave similarly as scribe 
 (https://github.com/facebook/scribe). It would be something between memory 
 and file channel. Input events would be saved directly to the memory (only) 
 and would be served from there. In case that the memory would be full, we 
 would outsource the events to file.
 Let me describe the use case behind this request. We have plenty of frontend 
 servers that are generating events. We want to send all events to just 
 limited number of machines from where we would send the data to HDFS (some 
 sort of staging layer). Reason for this second layer is our need to decouple 
 event aggregation and front end code to separate machines. Using memory 
 channel is fully sufficient as we can survive lost of some portion of the 
 events. However in order to sustain maintenance windows or networking issues 
 we would have to end up with a lot of memory assigned to those staging 
 machines. Referenced scribe is dealing with this problem by implementing 
 following logic - events are saved in memory similarly as our MemoryChannel. 
 However in case that the memory gets full (because of maintenance, networking 
 issues, ...) it will spill data to disk where they will be sitting until 
 everything start working again.
 I would like to introduce channel that would implement similar logic. It's 
 durability guarantees would be same as MemoryChannel - in case that someone 
 would remove power cord, this channel would lose data. Based on the 
 discussion in FLUME-1201, I would propose to have the implementation 
 completely independent on any other channel internal code.
 Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


static library for X11

2009-07-03 Thread rahul ravindran
Hello,
Is there any way to compile the X library statically.
___
xorg mailing list
xorg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xorg

[fltk.general] iconview

2008-12-24 Thread RAHUL RAVINDRAN
hello,
Any body know how can i create iconview in fltk2.0.



  Add more friends to your messenger and enjoy! Go to 
http://messenger.yahoo.com/invite/
___
fltk mailing list
fltk@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk


[fltk.general] event handling

2008-12-22 Thread RAHUL RAVINDRAN
How can i handle key press event for Input widget using fluid2.I am using 
fltk2.0.Please help me.



  Add more friends to your messenger and enjoy! Go to 
http://messenger.yahoo.com/invite/
___
fltk mailing list
fltk@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk


[fltk.general] event handling

2008-12-22 Thread RAHUL RAVINDRAN
How can i handle key press event for Input widget using fluid2.I am using 
fltk2.0.Please help me.



  Add more friends to your messenger and enjoy! Go to 
http://messenger.yahoo.com/invite/
___
fltk mailing list
fltk@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk


[nanogui] keypress problem in fltk2.0

2008-12-22 Thread RAHUL RAVINDRAN
hi,
How can i access keypress event for fltk2.0 by using fluild2.
please help me.



  Connect with friends all over the world. Get Yahoo! India Messenger at 
http://in.messenger.yahoo.com/?wm=n/

[nanogui] TEXTBOX

2008-08-26 Thread RAHUL RAVINDRAN
Sir,
It can possible to have textbox in nanox itself(not using window api or fltk 
libraries)
if possible can u give me some hint or idea.
Thank you.



  Unlimited freedom, unlimited storage. Get it now, on 
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/

[nanogui] font problem

2008-08-23 Thread RAHUL RAVINDRAN
sir,
 Currently i am working on nano-x and created small application which create a 
simple
 window which has buttons and labels.
i am trying to set size of the font of the button and labels but it size 
doesnot change.
i want to change the size of the font which is displayed on the button.
i used GrCreateFont() and GrSetFontSize() but it doesnot change the fontsize.

please help me.



  Did you know? You can CHAT without downloading messenger. Go to 
http://in.webmessenger.yahoo.com/

[nanogui] FlTK's help and nanox's help

2008-08-22 Thread RAHUL RAVINDRAN
thank u for the reply
i downloaded FLTK .But when executed some of the application 
of FLTK they executed in separately and not on the nano-X server.
I liked the application But i wanted that when i execute  the FLTK application 
it should
not  be display  immediately but  it should be displayed within the nano-X  
server  
window (the client area i.e the black screen) and not separately.This same 
problem
also occur while i used window api for creating window.

And can u explain me or give me some way so that i can learn working of 
event in nanox(using nanox.h) and can i able to  make textbox which take inputs 
from user in nanox application.
 


  Add more friends to your messenger and enjoy! Go to 
http://in.messenger.yahoo.com/invite/

[nanogui] compiling problem

2008-08-20 Thread RAHUL RAVINDRAN
Thank u for the reply.
But i tried the compiling and linking command :

 gcc -O -lm -L/usr/X11R6/lib -lX11 -o mtest2 mtest2.c  -lXft -lmwin -lmwimages 
-lm /usr/X11R6/lib/libX11.a

but the following error is generated :

/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_destroyfont':
/root/microwindows-0.91/src/engine/font_freetype.c:621: undefined reference to 
`TT_Close_Face'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `GdGetFontList':
/root/microwindows-0.91/src/engine/font_freetype.c:866: undefined reference to 
`TT_Init_FreeType'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `get_tt_name':
/root/microwindows-0.91/src/engine/font_freetype.c:824: undefined reference to 
`TT_Open_Face'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `tt_lookup_name':
/root/microwindows-0.91/src/engine/font_freetype.c:769: undefined reference to 
`TT_Get_Face_Properties'
/root/microwindows-0.91/src/engine/font_freetype.c:773: undefined reference to 
`TT_Get_Name_ID'
/root/microwindows-0.91/src/engine/font_freetype.c:774: undefined reference to 
`TT_Get_Name_String'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `get_tt_name':
/root/microwindows-0.91/src/engine/font_freetype.c:831: undefined reference to 
`TT_Close_Face'
/root/microwindows-0.91/src/engine/font_freetype.c:831: undefined reference to 
`TT_Close_Face'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_createfont':
/root/microwindows-0.91/src/engine/font_freetype.c:131: undefined reference to 
`TT_Open_Face'
/root/microwindows-0.91/src/engine/font_freetype.c:136: undefined reference to 
`TT_Load_Kerning_Table'
/root/microwindows-0.91/src/engine/font_freetype.c:154: undefined reference to 
`TT_Get_Face_Properties'
/root/microwindows-0.91/src/engine/font_freetype.c:169: undefined reference to 
`TT_New_Glyph'
/root/microwindows-0.91/src/engine/font_freetype.c:173: undefined reference to 
`TT_New_Instance'
/root/microwindows-0.91/src/engine/font_freetype.c:177: undefined reference to 
`TT_Set_Instance_Resolutions'
/root/microwindows-0.91/src/engine/font_freetype.c:190: undefined reference to 
`TT_Get_CharMap'
/root/microwindows-0.91/src/engine/font_freetype.c:184: undefined reference to 
`TT_Get_CharMap_ID'
/root/microwindows-0.91/src/engine/font_freetype.c:139: undefined reference to 
`TT_Get_Kerning_Directory'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function 
`freetype_setfontrotation':
/root/microwindows-0.91/src/engine/font_freetype.c:655: undefined reference to 
`TT_Set_Instance_Transform_Flags'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_setfontsize':
/root/microwindows-0.91/src/engine/font_freetype.c:637: undefined reference to 
`TT_Set_Instance_PixelSizes'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_getfontinfo':
/root/microwindows-0.91/src/engine/font_freetype.c:492: undefined reference to 
`TT_Get_Face_Properties'
/root/microwindows-0.91/src/engine/font_freetype.c:493: undefined reference to 
`TT_Get_Instance_Metrics'
/root/microwindows-0.91/src/engine/font_freetype.c:504: undefined reference to 
`TT_CharMap_First'
/root/microwindows-0.91/src/engine/font_freetype.c:505: undefined reference to 
`TT_CharMap_Last'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `Get_Glyph_Width':
/root/microwindows-0.91/src/engine/font_freetype.c:244: undefined reference to 
`TT_Char_Index'
/root/microwindows-0.91/src/engine/font_freetype.c:244: undefined reference to 
`TT_Load_Glyph'
/root/microwindows-0.91/src/engine/font_freetype.c:249: undefined reference to 
`TT_Load_Glyph'
/root/microwindows-0.91/src/engine/font_freetype.c:254: undefined reference to 
`TT_Get_Glyph_Metrics'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_init':
/root/microwindows-0.91/src/engine/font_freetype.c:87: undefined reference to 
`TT_Init_FreeType'
/root/microwindows-0.91/src/engine/font_freetype.c:92: undefined reference to 
`TT_Init_Kerning_Extension'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `freetype_drawtext':
/root/microwindows-0.91/src/engine/font_freetype.c:395: undefined reference to 
`TT_Get_Face_Properties'
/root/microwindows-0.91/src/engine/font_freetype.c:396: undefined reference to 
`TT_Get_Instance_Metrics'
/root/microwindows-0.91/src/engine/font_freetype.c:430: undefined reference to 
`TT_Char_Index'
/root/microwindows-0.91/src/engine/font_freetype.c:432: undefined reference to 
`TT_Load_Glyph'
/root/microwindows-0.91/src/engine/font_freetype.c:441: undefined reference to 
`TT_Get_Glyph_Metrics'
/root/microwindows-0.91/src/engine/font_freetype.c:447: undefined reference to 
`TT_Transform_Vector'
/usr/X11R6/lib/libmwin.a(font_freetype.o): In function `drawchar':
/root/microwindows-0.91/src/engine/font_freetype.c:275: undefined reference to 
`TT_Get_Glyph_Outline'
/root/microwindows-0.91/src/engine/font_freetype.c:276: undefined reference to 
`TT_Get_Outline_BBox'
/root/microwindows-0.91/src/engine/font_freetype.c:310: undefined 

Fw: [nanogui] window's help

2008-08-20 Thread RAHUL RAVINDRAN


--- On Wed, 20/8/08, RAHUL RAVINDRAN [EMAIL PROTECTED] wrote:
From: RAHUL RAVINDRAN [EMAIL PROTECTED]
Subject: [nanogui] window's help
To: nanogui nanogui@linuxhacker.org
Date: Wednesday, 20 August, 2008, 7:19 PM

Sir,
 In nanox how can i create a window which contain text box and button
and which
    does not call nanox  server immediately after it execute.
  i.e 
   i want the nanox server to be called first and after sometime only or on
any event
   the window with text box and button(with the help of windows api) should
be shown.

please help me.



  Unlimited freedom, unlimited storage. Get it now, on
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/


  Connect with friends all over the world. Get Yahoo! India Messenger at 
http://in.messenger.yahoo.com/?wm=n/

[nanogui] how to compile and link win32 api in linux

2008-08-19 Thread RAHUL RAVINDRAN
Sir,
   i went through the examples in microwindow.90/demos/mwin
   In that i go throught mtest2.c mtest.c which uses window api.
   i want to know how it is compiled and linked.
  can u give the  command for which  i can compile  and link  my file
which uses
  win32 api.
  Because i created my own file but while linking there is error:

   vcdemo1.c:(.text+0xa): undefined reference to `GetDesktopWindow'
vcdemo1.c:(.text+0x1e): undefined reference to `GetWindowRect'
vcdemo1.c:(.text+0x42): undefined reference to `MwRegisterButtonControl'
vcdemo1.c:(.text+0x4e): undefined reference to `MwRegisterEditControl'
vcdemo1.c:(.text+0x5a): undefined reference to `MwRegisterStaticControl'
vcdemo1.c:(.text+0x9d): undefined reference to `GetStockObject'
vcdemo1.c:(.text+0xbc): undefined reference to `RegisterClass'
vcdemo1.c:(.text+0x11e): undefined reference to `CreateWindowEx'
vcdemo1.c:(.text+0x19a): undefined reference to `CreateWindowEx'
vcdemo1.c:(.text+0x217): undefined reference to `CreateWindowEx'
vcdemo1.c:(.text+0x290): undefined reference to `CreateWindowEx'
vcdemo1.c:(.text+0x2a2): undefined reference to `ShowWindow'
vcdemo1.c:(.text+0x2ad): undefined reference to `UpdateWindow'
vcdemo1.c:(.text+0x2ba): undefined reference to `TranslateMessage'
vcdemo1.c:(.text+0x2c5): undefined reference to `DispatchMessage'
vcdemo1.c:(.text+0x2e8): undefined reference to `GetMessage'
vcdemo1.o: In function `wproc':
vcdemo1.c:(.text+0x317): undefined reference to `PostQuitMessage'
vcdemo1.c:(.text+0x339): undefined reference to `DefWindowProc'

please help.




  Unlimited freedom, unlimited storage. Get it now, on 
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/