I'm happy to let the Java PP testing slide to 1.5.0

There are some recent improvements in the ruby PP that I need to implement.
* sakaidocs - (easy, call out to wkhtmltopdf)
* image previews in the same format as the original

Erik

On Tue, Jul 24, 2012 at 10:18 AM, Kent Fitzgerald <kentf...@umich.edu> wrote:
> Several questions/comments.
> There has already been  1.4.1. release proposed for immediately following
> 1.4.0 that would be isolated to code reformatting . Which would take
> precedence?
>
> We should definitely do a bug bash. One of the dangers of doing a bug bash
> focused on the preview processor is that we'll likely have people uploading
> hundreds of files each. Subjectively, this could give the impression of
> decreased performance just because we're hitting it much harder.
>
> More importantly, in addition to the bug bash, we need to do controlled
> tests on processing time on different data types. I'd like to break it down
> by file types and have truly controlled tests, in addition to different file
> types we'll need files of varying  sizes to compare performance not just on
> quantity but on complexity. This needs to be compared to the performance of
> the current implementation.
>
> I think we all agree that this is an important feature that we shouldn't try
> to rush out the door.
>
> I have to read back through the thread, but is there set-up documentation?
> Currently we have a section on the OAE Configuration and Deployment page [1]
> for the preview processor. It's contains multiple supporting external links
> that have proven confusing for many people trying to get preview processor
> running locally. We'll need to make sure we have adequate documentation.
>
> As a side note, I will be out of the office starting this Friday through
> next week.
>
>
> [1]
> https://confluence.sakaiproject.org/display/3AK/OAE+Configuration+and+Deployment
>
>
>
> --
> Kent Fitzgerald
>
> On Tuesday, July 24, 2012 at 9:51 AM, Nicolaas Matthijs wrote:
>
> Looks like this has been hanging around on list for a while now, and we
> should probably try to move it forwards.
>
> The maintainability criterion can only be determined by a code review, which
> is standard practice. However, as this is proving to be such a critical
> feature in production, I'd suggest that we do a separate bugbash to evaluate
> its performance, ease of setup (running from a separate machine) and most
> importantly functional equivalence.
>
> When doing this, Kent can give his assessment of the ease of setup and the
> bugbashers can determine functional equivalence. We should also try to have
> it re-process the dummy content we usually bugbash with.
>
> If this all sounds good, I'd like to go ahead with this as soon as possible
> and run a bugbash straight after the 1.4.0 release with all of this set up.
> If the implementation survives the bugbash, it can be reviewed and merged.
>
> Does that sound reasonable?
>
> Thanks,
> Nicolaas
>
>
>
> On 23 Jul 2012, at 07:42, Carl Hall wrote:
>
> Lance, I think the work is already split the way you suggest given what I
> know about what Erik has done (rewrite in Java) and what's left (add JMS).
> Adding message queue capabilities should not hold back reviewing the
> proposed changes.
>
> I would say that it needs to meet these opening criteria for my general
> acceptance:
>
> * Be functionally equal with the current solution
> * A combination of performance and maintainability
>    * Perform can be no worse overall. There might be different hotspots in
> the java version than the current ruby solution but there shouldn't be
> anything exponentially worse. Overall, the java version has to perform at
> least as good and hopefully better. Memory usage, overall processing time,
> resource usage (iops, disc reads, caching) should all be considered.
>    * Be more maintainable than the Ruby solution. The current code has had
> very little cleaning and is not very readable. This includes using
> externally available libraries where possible. We shouldn't be maintaining
> plumbing not inherent to our domain.
> * Easier to setup. Though our current setup for the ruby PP is known to be
> problematic, we at least are accustomed to it. The proposed solution has got
> to be more straightforward and less fragile.
>
> The numbers I've seen from some preliminary testing showed the Java impl to
> take exponentially *less* time to process pdfs and was faster than the ruby
> PP in every test. It's an OSGi bundle and written in Java like the rest of
> our project which makes it easier to setup and maintain as we write far more
> java code than ruby. I believe there's also already a setup available to run
> the java PP as a standalone server.
> The Java version introduces a topia term extractor bundle which is a port
> from the Python version. This is a point of maintenance to consider but the
> python code has changed in years. It's a common impl for other languages to
> port but there wasn't a java version around. I would like to see this code
> find a permanent home in a relative OSS project. At the very least it should
> be maintained apart from OAE core to make it available to a broader
> audience.
>
> +1 to getting this code wrapped up and reviewed.
>
> On Wed, Jul 18, 2012 at 1:51 PM, Christian Vuerings
> <vueringschrist...@gmail.com> wrote:
>
> I'm not sure whether this is already part of the criteria list or not, but
> what about CPU/Memory usage?
> Is there a way we can measure that and compare it to the current ruby based
> PP?
> When I currently run the ruby PP locally, it's usually one of the processes
> that uses the most resources.
>
> One other thing I'm curious about is how well it will compress/handle the
> different file formats (png/jpg/gif/psd)
>
> These are just 2 things that I'm interested in since they (can) have an
> impact on the overall performance.
>
>
> - Christian
>
> On Jul 18, 2012, at 12:41 PM, Lance Speelmon wrote:
>
> Does anyone have an opinion about adopting the new java based PP?
> Specifically can you articulate acceptance criteria for such an adoption?
> e.g.
>
> Must support same preview behaviors as existing ruby-based PP.
> Must pass QA with all blocker and critical items resolved.
> Must start automatically OOTB to support the tire-kicking, web-start uses.
> Must leverage as much 3rd party code as possible to minimize ownership
> costs.
> Must pass code review.
> Unit test code coverage.
> Basic config and deployment documentation.
>
>
> What is missing?  Anything?  Thanks, L
>
>
>
> On Jul 17, 2012, at 2:58 PM, Lance Speelmon <la...@rsmart.com> wrote:
>
> Is there any way to break this work down into chunks?  e.g.
>
> 1. Adopt java PP as default PP moving forward. What are the acceptance
> criteria?
> 2. Enhance new java PP with message queue abilities.
>
> WDYT?  Thanks, L
>
> On Jul 17, 2012, at 8:34 AM, Carl Hall <c...@hallwaytech.com> wrote:
>
> Each app server could run it's own queues but that wouldn't support building
> a farm of PP processors unless we also teach them to talk to multiple JMS
> servers. Maybe something like DNS round-robin would suffice?
>
> On Tue, Jul 17, 2012 at 8:25 AM, Erik Froese <erik.fro...@gmail.com> wrote:
>
> Do we need to cluster activemq? Can't each app server service its own
> queues?
> Erik
>
> On Tue, Jul 17, 2012 at 11:23 AM, Carl Hall <c...@hallwaytech.com> wrote:
>> What Erik describes has been on the dev wish list for a little while now.
>> Moving to an event-driven model would allow us to build out concurrency
>> but
>> there also comes the question of clustering ActiveMQ.
>>
>>
>> On Thu, Jul 12, 2012 at 6:27 AM, Erik Froese <erik.fro...@gmail.com>
>> wrote:
>>>
>>> Hey David,
>>>
>>> The code is not clustered.
>>>
>>> You'd need to write an event listener that would fire when new content
>>> is uploaded. It would put the content ids on a JMS queue. Then
>>> implement a ContentFetcher that grabs a message off of the queue and
>>> wire that into the PPI. Events and Messages are not clustered in OAE
>>> (AFAIK) so this would have to be run on each app server.
>>>
>>> While we're in event-land it'd be nice to be able to regenerate a
>>> preview when a content body is updated. I'm not sure if this is
>>> possible yet.
>>>
>>> I'm not sure how we'd limit the CPU usage yet either. You could manage
>>> the quartz schedule that runs the PPI.
>>>
>>> We can also disable concurrent executions of the job.
>>>
>>> Erik
>>>
>>> On Wed, Jul 11, 2012 at 8:44 PM, Roma, David <dr...@csu.edu.au> wrote:
>>> > Awesome news Erik!
>>> >
>>> > Our Ops guys will be stoked when we can get this in.. A couple of
>>> > questions from someone who hasn't looked at the code or read too
>>> > deeply....
>>> > - Does it support clustering
>>> >         -e.g. can we just run it side by side on each of our app
>>> > servers
>>> > and they will play nice sharing out processing jobs?
>>> >         -will it affect performance of the app servers much? Can we
>>> > limit the preview processor to say 10%cpu and 500mb ram or low priority
>>> > threads or limit the number of items to process or something? This
>>> > would
>>> > make for a nice simple deployment that wouldn't threaten the app server
>>> > stability.
>>> >
>>> > Cheers,
>>> > Dave.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: oae-dev-boun...@collab.sakaiproject.org
>>> > [mailto:oae-dev-boun...@collab.sakaiproject.org] On Behalf Of Erik
>>> > Froese
>>> > Sent: Thursday, 12 July 2012 2:37 AM
>>> > To: Carl Hall
>>> > Cc: oae-dev@collab.sakaiproject.org; Clay Fenlason
>>> > Subject: Re: [oae-dev] Moving the preview processor to java
>>> >
>>> > Hey everyone,
>>> >
>>> > Its been a few months but I actually implemented the Java preview
>>> > processor as an OSGi bundle. I filed a ticket for it [1]
>>> >
>>> > I'm not sure where to go from here. Is this something that could be
>>> > included POST 1.4.0?
>>> > Should I open a PR so we can review the code? If so, PR against which
>>> > branch?
>>> >
>>> > Either way, have a look, give it a go. We'll probably wind up using it
>>> > at rSmart.
>>> >
>>> > Erik
>>> >
>>> > [1] https://jira.sakaiproject.org/browse/KERN-3021
>>> >
>>> >
>>> >
>>> > On Tue, Apr 17, 2012 at 9:09 AM, Carl Hall <c...@hallwaytech.com>
>>> > wrote:
>>> >> I totally agree that we should ally ourselves with other communities.
>>> >> I
>>> >> see
>>> >> where we get docsplit from DocumentCloud[1] and we use several other
>>> >> libraries for processing that they've most likely contributed to.
>>> >> The Java approach is very little custom code compared to the libraries
>>> >> we're
>>> >> getting from Apache (tika, sanselan, commons, pdfbox), so we would
>>> >> still
>>> >> building on the shoulders of our friendly community giants.
>>> >>
>>> >> 1 https://github.com/documentcloud/docsplit
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Apr 14, 2012 at 5:43 AM, John Norman <j...@caret.cam.ac.uk>
>>> >> wrote:
>>> >>>
>>> >>> My recollection (perhaps wrong) is that  we got this from Document
>>> >>> Cloud
>>> >>> and I /think/ Chris Roby found it. Document Cloud seems a very
>>> >>> relevant and
>>> >>> valuable project. If we were able to help them while helping
>>> >>> ourselves,
>>> >>> other good things could come from the relationship. My general point
>>> >>> is that
>>> >>> we are thin on resources and so, in principle, symbiotic
>>> >>> relationships
>>> >>> are
>>> >>> helpful.
>>> >>>
>>> >>> http://www.documentcloud.org/home
>>> >>>
>>> >>> John
>>> >>>
>>> >>> Sent from my iPad
>>> >>>
>>> >>> On 13 Apr 2012, at 17:03, Carl Hall <c...@hallwaytech.com> wrote:
>>> >>>
>>> >>> I agree with Daniel that our modifications to the preview processor
>>> >>> have
>>> >>> put its ownership square on us. Was there a community that this
>>> >>> script
>>> >>> was
>>> >>> borrowed from? I thought it was original development that uses
>>> >>> various
>>> >>> external libraries to do the actual work. This is the approach that
>>> >>> Erik is
>>> >>> taking with the rewrite using things like Tika (text extraction),
>>> >>> Sanselan
>>> >>> (images) and a Java port of the python topia.termextract library.
>>> >>>
>>> >>> I certainly don't deny the speed of development that was realized in
>>> >>> creating the PP but the current state of the code is a mess at best.
>>> >>> Reuse
>>> >>> of libraries in Java is showing a fast rewrite with very little
>>> >>> managed code
>>> >>> on our part.
>>> >>>
>>> >>>
>>> >>> On Fri, Apr 13, 2012 at 12:50 AM, Daniel Parry
>>> >>> <dan...@caret.cam.ac.uk>
>>> >>> wrote:
>>> >>>>
>>> >>>> On Thu, Apr 12, 2012 at 04:21:36PM -0400, Clay Fenlason wrote:
>>> >>>> > I think this response is at best orthogonal to the point John's
>>> >>>> > trying
>>> >>>> > to raise, though I gather this kind of reaction must come from a
>>> >>>> > buildup of some real frustration around the PP, which I don't mean
>>> >>>> > to
>>> >>>> > discount. I also think John was pretty clear about what he was
>>> >>>> > suggesting: that there be a conversation with the community we got
>>> >>>> > the
>>> >>>> > PP from, if the conversation hasn't happened already, to see if
>>> >>>> > there
>>> >>>> > might still be a way to work together before we decide to just own
>>> >>>> > it
>>> >>>> > ourselves.
>>> >>>>
>>> >>>> I'd suggest the way that the preview processor was being extended
>>> >>>> (initially a
>>> >>>> python server add on, followed by a ruby rewrite for tag extraction)
>>> >>>> and
>>> >>>> the
>>> >>>> variety of ruby versions that deployers were using and the methods
>>> >>>> used
>>> >>>> to
>>> >>>> deploy it were indicative of a) the OAE community already 'owning'
>>> >>>> the PP
>>> >>>> and b)
>>> >>>> as has already been pointed out some standardization needed
>>> >>>> restoring
>>> >>>> and
>>> >>>> additional functionality added for deployers.  Hence, the list was
>>> >>>> pinged[0] a
>>> >>>> while back to ask about standardizing and extending in java. I'm not
>>> >>>> sure
>>> >>>> of any
>>> >>>> other way to contact the original PP community or if such a
>>> >>>> community
>>> >>>> separate
>>> >>>> to OAE even still exists?
>>> >>>>
>>> >>>> Best wishes,
>>> >>>>
>>> >>>> Daniel
>>> >>>>
>>> >>>> [0]
>>> >>>>
>>> >>>>
>>> >>>> http://collab.sakaiproject.org/pipermail/oae-dev/2012-April/001677.html
>>> >>>>
>>> >>>> --
>>> >>>> --| Daniel Parry: dan...@caret.cam.ac.uk. www.caret.cam.ac.uk/ |--
>>> >>>> "Of all the things a leader should fear, complacency should
>>> >>>>  head the list." [John C. Maxwell]
>>> >>>> _______________________________________________
>>> >>>> oae-dev mailing list
>>> >>>> oae-dev@collab.sakaiproject.org
>>> >>>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> oae-dev mailing list
>>> >>> oae-dev@collab.sakaiproject.org
>>> >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> oae-dev mailing list
>>> >> oae-dev@collab.sakaiproject.org
>>> >> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> >>
>>> > _______________________________________________
>>> > oae-dev mailing list
>>> > oae-dev@collab.sakaiproject.org
>>> > http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>>> > Charles Sturt University
>>> >
>>> > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | MELBOURNE |
>>> > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA |
>>> >
>>> > LEGAL NOTICE
>>> > This email (and any attachment) is confidential and is intended for the
>>> > use of the addressee(s) only. If you are not the intended recipient of
>>> > this
>>> > email, you must not copy, distribute, take any action in reliance on it
>>> > or
>>> > disclose it to anyone. Any confidentiality is not waived or lost by
>>> > reason
>>> > of mistaken delivery. Email should be checked for viruses and defects
>>> > before
>>> > opening. Charles Sturt University (CSU) does not accept liability for
>>> > viruses or any consequence which arise as a result of this email
>>> > transmission. Email communications with CSU may be subject to automated
>>> > email filtering, which could result in the delay or deletion of a
>>> > legitimate
>>> > email before it is read at CSU. The views expressed in this email are
>>> > not
>>> > necessarily those of CSU.
>>> >
>>> > Charles Sturt University in Australia  http://www.csu.edu.au  The
>>> > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795  ABN: 83 878
>>> > 708
>>> > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)
>>> >
>>> > Charles Sturt University in Ontario  http://www.charlessturt.ca 860
>>> > Harrington Court, Burlington Ontario Canada L7N 3N4  Registration:
>>> > www.peqab.ca
>>> >
>>> > Consider the environment before printing this email.
>>
>>
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
>
>
> _______________________________________________
> oae-dev mailing list
> oae-dev@collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/oae-dev
>
_______________________________________________
oae-dev mailing list
oae-dev@collab.sakaiproject.org
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to