Re: CPE memory usage

2016-08-29 Thread Jens Grivolla
Hi Armin, glad I could help. Getting all IDs first also avoids problems
with changing data which could mess with the offsets. This way you have a
fixed snapshot of all existing documents (at the beginning).

Best,
Jens

On Mon, Aug 29, 2016 at 8:12 AM,  wrote:

> Hi Jens,
>
> I just want to confirm your information. As you said, the query gets
> slower the larger start is, even using filters. The best solution is to get
> all ids first (may take some time), and then to get each documents by id
> successively. There is a request handler (get) and a Java API method
> (HttpSolrClient.getById()) to do so.
>
> Thanks to your help, I have a constantly fast queries, now.
>
> Cheers,
> Armin
>
> -Ursprüngliche Nachricht-
> Von: j...@grivolla.net [mailto:j...@grivolla.net] Im Auftrag von Jens
> Grivolla
> Gesendet: Dienstag, 16. August 2016 13:34
> An: user@uima.apache.org
> Betreff: Re: CPE memory usage
>
> Solr is known not to be very good at deep paging, but rather getting the
> top relevant results. Running a query asking for the millionth document is
> pretty much the worst you can do as it will have to rank all documents
> again, up to the millionth, and return that one. It can also be unreliable
> if your document collection changes.
>
> We did get it to work quite well, though. I believe we used only filters
> and retrieved the results in natural order, so that Solr wouldn't have to
> rank the documents. We also had a version where we first retrieved all
> matching document ids in one go, and then queried for the documents by id,
> one by one, in getNext().
>
> Deep paging has also seen some major improvements over time IIRC, so newer
> Solr versions should perform much better than the ones from a few years
> ago.
>
> Best,
> Jens
>
> On Tue, Aug 9, 2016 at 12:20 PM,  wrote:
>
> > Hi!
> >
> > Finally, it looks like that Solr causes the high memory consumption. The
> > SolrClient isn't expected to be used like I did it. But it isn't
> documented
> > either. The Solr documentation is very bad. I just happened to find a
> > solution on the web by accident.
> >
> > Thanks,
> > Armin
> >
> > -Ursprüngliche Nachricht-
> > Von: Richard Eckart de Castilho [mailto:r...@apache.org]
> > Gesendet: Montag, 8. August 2016 15:33
> > An: user@uima.apache.org
> > Betreff: Re: CPE memory usage
> >
> > Do you have code for a minimal test case?
> >
> > Cheers,
> >
> > -- Richard
> >
> > > On 08.08.2016, at 15:31,  <
> > armin.weg...@bka.bund.de> wrote:
> > >
> > > Hi Richard!
> > >
> > > I've changed the document reader to a kind of no-op-reader, that always
> > sets the document text to an empty string: same behavior, but much slower
> > increase in memory usage.
> > >
> > > Cheers,
> > > Armin
> >
> >
>


Re: UIMA class?

2016-08-29 Thread Richard Eckart de Castilho
Hi Sean,

there have been occasional hands-on tutorials that I know of, one of which are 
linked here:
 
  https://uima.apache.org/gscl13.html

That's one that Peter (Ruta) and I (uimaFIT/DKPro Core) did IRL.

But I have seen slides here and there on Google from other tutorials
from conferences and also I believe also university lecture slides
that make use of UIMA, e.g.

  http://de.slideshare.net/otisg/uima
  http://www3.cs.stonybrook.edu/~pfodor/courses/CSE392/L1-UIMA.pdf

( For me, the two links above currently don't work... but I think this is rather
due to my network connection, not due to them being offline... I hope ;) )

Cheers,

-- Richard

> On 29.08.2016, at 17:31, Sean Crist  wrote:
> 
> 
> Hi,
> 
> I’ve found that UIMA has such a learning curve to it that I’m wondering if 
> anyone ever offers classes in it (real-life, not YouTube).  If the instructor 
> really knew what they were doing, I’d pay to travel somewhere for a week to 
> get some intensive training in it.
> 
> Failing that, I’m wondering whether there are any learning resources other 
> than just sitting down with the manual and trying stuff (which is what I’ve 
> been doing).  That’s how I normally learn new systems, but this is a 
> particularly tough one to learn without any sort of mentoring.
> 
> —Sean
> 
> 



Re: UIMA class?

2016-08-29 Thread Debbie Zhang
I think if someone can make some YouTube videos. That would be very helpful.

Regards,
Debbie

> On 30 Aug 2016, at 2:14 AM, Marshall Schor  wrote:
> 
> did you see the "getting started" part of the docs for UIMA?
> 
> e.g. http://uima.apache.org/documentation.html#getting_started
> 
> -Marshall
>> On 8/29/2016 11:31 AM, Sean Crist wrote:
>> Hi,
>> 
>> I’ve found that UIMA has such a learning curve to it that I’m wondering if 
>> anyone ever offers classes in it (real-life, not YouTube).  If the 
>> instructor really knew what they were doing, I’d pay to travel somewhere for 
>> a week to get some intensive training in it.
>> 
>> Failing that, I’m wondering whether there are any learning resources other 
>> than just sitting down with the manual and trying stuff (which is what I’ve 
>> been doing).  That’s how I normally learn new systems, but this is a 
>> particularly tough one to learn without any sort of mentoring.
>> 
>> —Sean
> 


Re: UIMA class?

2016-08-29 Thread Marshall Schor
did you see the "getting started" part of the docs for UIMA?

e.g. http://uima.apache.org/documentation.html#getting_started

-Marshall
On 8/29/2016 11:31 AM, Sean Crist wrote:
> Hi,
>
> I’ve found that UIMA has such a learning curve to it that I’m wondering if 
> anyone ever offers classes in it (real-life, not YouTube).  If the instructor 
> really knew what they were doing, I’d pay to travel somewhere for a week to 
> get some intensive training in it.
>
> Failing that, I’m wondering whether there are any learning resources other 
> than just sitting down with the manual and trying stuff (which is what I’ve 
> been doing).  That’s how I normally learn new systems, but this is a 
> particularly tough one to learn without any sort of mentoring.
>
> —Sean
>
>
>



UIMA class?

2016-08-29 Thread Sean Crist

Hi,

I’ve found that UIMA has such a learning curve to it that I’m wondering if 
anyone ever offers classes in it (real-life, not YouTube).  If the instructor 
really knew what they were doing, I’d pay to travel somewhere for a week to get 
some intensive training in it.

Failing that, I’m wondering whether there are any learning resources other than 
just sitting down with the manual and trying stuff (which is what I’ve been 
doing).  That’s how I normally learn new systems, but this is a particularly 
tough one to learn without any sort of mentoring.

—Sean




Re: Debugging a NullPointerException in UIMA AS / processing timeouts

2016-08-29 Thread Jaroslaw Cwiklik
Egbert, thanks. I forgot to ask, what version of UIMA-AS are you using?
Also, are you using sendCAS() or sendAndReceive() API?

Have a great vacation!

-jerry

On Sun, Aug 28, 2016 at 9:39 AM, Egbert van der Wal 
wrote:

> Hi Jerry,
>
> Thanks for the suggestion. I have the feeling that it's a race condition,
> too, but since I'm doing no multi-threading myself, basically all the
> threading and synchronization should be UIMA-internal.
>
> Anyway, I'll have to postpone researching the issue due to going on
> vacation. When I get back I'll try to get more information with a increased
> log level, and get back to you.
>
> Thanks again!
>
> Regards,
>
> Egbert
>
>
> Op 25-8-2016 om 17:17 schreef Jaroslaw Cwiklik:
>
> Hi, I have a feeling that there might be a race condition here. In the
>> client, the timer pops and at the same time a reply is received.
>> The timout logic is resetting the CAS while its being deserialized which
>> may lead to NPE. Not 100% certain but this might be the problem.
>>
>> Any chance you can increase UIMA log level to FINEST on the client side?
>> It
>> would log important information like the internal CAS ID  on each reply
>> which can be used to correlate events in the log.
>>
>> -jerry
>>
>> On Thu, Aug 25, 2016 at 10:18 AM, Egbert van der Wal 
>> wrote:
>>
>> Hi,
>>>
>>> I'm having a problem using UIMA-AS. I have a pipeline set up that
>>> processes HTML documents in ~= 10 ms. The total time out value was
>>> initially 20 seconds, but I increased it to 120 ms at some point to avoid
>>> this problem, it seemed to help.
>>>
>>> However, sometimes the 2 minutes is still hit and a warning is shown.
>>> When
>>> this occurs, it will usually be accompanied with NullPointerExceptions in
>>> combination with Xerces, somewhere in the internals of UIMA. See the
>>> attached log-file excerpt for the errors I'm seeing. The first 5 lines
>>> are
>>> the 'normal' output, which was repeated for several thousand lines before
>>> during the succesful operation of the pipeline.
>>>
>>> The SOFA that is being sent out during this particular exception is a
>>> quite small HTML-document, just a couple of kilobytes, and it's not
>>> actually reproducible with the same document; if I run the program again
>>> it
>>> will eventually fail, but at some other point.
>>>
>>> How can I go about solving this issue? Since the part of my own code in
>>> the stacktrace is limited to the point where 'sendCAS' is called, I can't
>>> really think of any additional debugging I can do.
>>>
>>> Any suggestions are highly appreciated!
>>>
>>> Thanks,
>>>
>>> Egbert van der Wal
>>>
>>>
>>


Does anybody have a copy of Ed Loper's "pycas"?

2016-08-29 Thread Richard Eckart de Castilho
Hi all,

some years back, Ed Loper had announced a Python-based CAS implementation
and apparently had made it available through his homepage. However, the
link does not work anymore.

Did anybody maybe download the code and could provide a copy of it?

(Btw. I already checked the Internet Archive, but it doesn't seem to
have archive that particular page.)

Best,

-- Richard


[1] 
https://mail-archives.apache.org/mod_mbox/uima-dev/200802.mbox/%3ccd6edfd30802181653m26bf3873yf80590be76142...@mail.gmail.com%3E

Re: Read CAS in node.js

2016-08-29 Thread Asher Stern
Yes, I agree.
What I actually need is a CAS deserialization library in node.js.



2016-08-29 11:58 GMT+03:00 Richard Eckart de Castilho :

> Well, then the question is in which format you have your CAS...
>
> Some time back, a basic JSON serialization module was added to UIMA.
> That might help you sending a CAS from Java to node.js. However,
> the JSON deserialization on the Java side is still missing.
>
> Cheers,
>
> -- Richard
>
> > On 29.08.2016, at 10:54, Asher Stern  wrote:
> >
> > Hi Richard.
> > Thanks very much!
> > However, we strongly prefer a solution which does not require this
> > interoperability of calling Java from node.js. So if such a solution is
> > available, it can be very helpful.
> >
> > Thanks a lot.
> > Asher
> >
> > 2016-08-29 11:45 GMT+03:00 Richard Eckart de Castilho :
> >
> >> Hi Asher,
> >>
> >> if you can call Java code from node.js, then check out the CasIOUtil
> >> class in uimaFIT or wait a couple of days and then check out the
> >> brand new CasIOUtils class in UIMAJ-SDK.
> >>
> >> Cheers,
> >>
> >> -- Richard
> >>
> >>> On 28.08.2016, at 11:11, Asher Stern  wrote:
> >>>
> >>> Hi.
> >>> I have a question.
> >>>
> >>> Are there any good methods to read CAS from within node.js?
> >>>
> >>> Thanks in advance,
> >>> Asher
>
>


Re: Read CAS in node.js

2016-08-29 Thread Asher Stern
Hi Richard.
Thanks very much!
However, we strongly prefer a solution which does not require this
interoperability of calling Java from node.js. So if such a solution is
available, it can be very helpful.

Thanks a lot.
Asher




2016-08-29 11:45 GMT+03:00 Richard Eckart de Castilho :

> Hi Asher,
>
> if you can call Java code from node.js, then check out the CasIOUtil
> class in uimaFIT or wait a couple of days and then check out the
> brand new CasIOUtils class in UIMAJ-SDK.
>
> Cheers,
>
> -- Richard
>
> > On 28.08.2016, at 11:11, Asher Stern  wrote:
> >
> > Hi.
> > I have a question.
> >
> > Are there any good methods to read CAS from within node.js?
> >
> > Thanks in advance,
> > Asher
>
>


Re: Read CAS in node.js

2016-08-29 Thread Richard Eckart de Castilho
Hi Asher,

if you can call Java code from node.js, then check out the CasIOUtil
class in uimaFIT or wait a couple of days and then check out the
brand new CasIOUtils class in UIMAJ-SDK.

Cheers,

-- Richard

> On 28.08.2016, at 11:11, Asher Stern  wrote:
> 
> Hi.
> I have a question.
> 
> Are there any good methods to read CAS from within node.js?
> 
> Thanks in advance,
> Asher