[jira] [Resolved] (CONNECTORS-339) We need a test for all of the localized versions of the UI

2012-01-12 Thread Karl Wright (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-339.


Resolution: Fixed

> We need a test for all of the localized versions of the UI
> --
>
> Key: CONNECTORS-339
> URL: https://issues.apache.org/jira/browse/CONNECTORS-339
> Project: ManifoldCF
>  Issue Type: Test
>  Components: Tests
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.5
>
>
> We need a way of testing the UI for functionality, regressions, and properly 
> formed HTML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-339) We need a test for all of the localized versions of the UI

2012-01-12 Thread Karl Wright (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184958#comment-13184958
 ] 

Karl Wright commented on CONNECTORS-339:


I've created new tickets for OpenSearchServer and Alfresco, so I can resolve 
this ticket.


> We need a test for all of the localized versions of the UI
> --
>
> Key: CONNECTORS-339
> URL: https://issues.apache.org/jira/browse/CONNECTORS-339
> Project: ManifoldCF
>  Issue Type: Test
>  Components: Tests
>Affects Versions: ManifoldCF 0.5
>Reporter: Karl Wright
>Assignee: Karl Wright
> Fix For: ManifoldCF 0.5
>
>
> We need a way of testing the UI for functionality, regressions, and properly 
> formed HTML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-380) Need a UI test for the Alfresco connector

2012-01-12 Thread Karl Wright (Created) (JIRA)
Need a UI test for the Alfresco connector
-

 Key: CONNECTORS-380
 URL: https://issues.apache.org/jira/browse/CONNECTORS-380
 Project: ManifoldCF
  Issue Type: Test
  Components: Alfresco connector
Affects Versions: ManifoldCF 0.5
Reporter: Karl Wright
Assignee: Piergiorgio Lucidi
 Fix For: ManifoldCF 0.5


The Alfresco connector needs a UI test, and needs whatever modifications are 
needed to its UI to make it testable.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-381) OpenSearchServer connector needs a UI test

2012-01-12 Thread Karl Wright (Created) (JIRA)
OpenSearchServer connector needs a UI test
--

 Key: CONNECTORS-381
 URL: https://issues.apache.org/jira/browse/CONNECTORS-381
 Project: ManifoldCF
  Issue Type: Test
  Components: OpenSearchServer connector
Affects Versions: ManifoldCF 0.5
Reporter: Karl Wright
 Fix For: ManifoldCF 0.5


The OpenSearchServer connector needs a UI test, and needs whatever 
modifications to the UI code are needed to make it testable.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CONNECTORS-376) Meridio connector's Japanese messages are not fully translated

2012-01-12 Thread Karl Wright (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184952#comment-13184952
 ] 

Karl Wright commented on CONNECTORS-376:


r1230518 is the commit which internationalizes the Meridio connector.


> Meridio connector's Japanese messages are not fully translated
> --
>
> Key: CONNECTORS-376
> URL: https://issues.apache.org/jira/browse/CONNECTORS-376
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Meridio connector
>Affects Versions: ManifoldCF 0.5
>Reporter: Hitoshi Ozawa
>Assignee: Karl Wright
>Priority: Minor
>  Labels: I18N
> Fix For: ManifoldCF 0.5
>
> Attachments: CONNECTORS-376.patch
>
>
> Should translate Meridio connector's Japanese message properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CONNECTORS-379) Merido connector needs to be internationalized

2012-01-12 Thread Karl Wright (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-379:
--

Assignee: Karl Wright

> Merido connector needs to be internationalized
> --
>
> Key: CONNECTORS-379
> URL: https://issues.apache.org/jira/browse/CONNECTORS-379
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Meridio connector
>Affects Versions: ManifoldCF 0.5
>Reporter: Hitoshi Ozawa
>Assignee: Karl Wright
>Priority: Minor
>  Labels: I18N
> Fix For: ManifoldCF 0.5
>
> Attachments: CONNECTORS-379.patch
>
>
> Messages in Merido connector needs to be externalized to properties file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-376) Meridio connector's Japanese messages are not fully translated

2012-01-12 Thread Karl Wright (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-376.


Resolution: Fixed

> Meridio connector's Japanese messages are not fully translated
> --
>
> Key: CONNECTORS-376
> URL: https://issues.apache.org/jira/browse/CONNECTORS-376
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Meridio connector
>Affects Versions: ManifoldCF 0.5
>Reporter: Hitoshi Ozawa
>Assignee: Karl Wright
>Priority: Minor
>  Labels: I18N
> Fix For: ManifoldCF 0.5
>
> Attachments: CONNECTORS-376.patch
>
>
> Should translate Meridio connector's Japanese message properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONNECTORS-379) Merido connector needs to be internationalized

2012-01-12 Thread Karl Wright (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-379.


Resolution: Fixed

r1230518


> Merido connector needs to be internationalized
> --
>
> Key: CONNECTORS-379
> URL: https://issues.apache.org/jira/browse/CONNECTORS-379
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Meridio connector
>Affects Versions: ManifoldCF 0.5
>Reporter: Hitoshi Ozawa
>Assignee: Karl Wright
>Priority: Minor
>  Labels: I18N
> Fix For: ManifoldCF 0.5
>
> Attachments: CONNECTORS-379.patch
>
>
> Messages in Merido connector needs to be externalized to properties file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CONNECTORS-379) Merido connector needs to be internationalized

2012-01-12 Thread Hitoshi Ozawa (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitoshi Ozawa updated CONNECTORS-379:
-

Attachment: CONNECTORS-379.patch

> Merido connector needs to be internationalized
> --
>
> Key: CONNECTORS-379
> URL: https://issues.apache.org/jira/browse/CONNECTORS-379
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Meridio connector
>Affects Versions: ManifoldCF 0.5
>Reporter: Hitoshi Ozawa
>Priority: Minor
>  Labels: I18N
> Fix For: ManifoldCF 0.5
>
> Attachments: CONNECTORS-379.patch
>
>
> Messages in Merido connector needs to be externalized to properties file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CONNECTORS-379) Merido connector needs to be internationalized

2012-01-12 Thread Hitoshi Ozawa (Created) (JIRA)
Merido connector needs to be internationalized
--

 Key: CONNECTORS-379
 URL: https://issues.apache.org/jira/browse/CONNECTORS-379
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Meridio connector
Affects Versions: ManifoldCF 0.5
Reporter: Hitoshi Ozawa
Priority: Minor
 Fix For: ManifoldCF 0.5


Messages in Merido connector needs to be externalized to properties file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Revisiting: Should Manifold include Pipelines

2012-01-12 Thread Karl Wright
Hi Mark,




>
> I'm not sure if this question is revisiting the motivation for preferring
> this in MCF, or a technical question about how to package metadata for
> different engines that might want it in a different format.
>

I'm looking not so much for justification, but for enough context as
to how to structure the code.  Based on what I've heard, it probably
makes the most sense to provide a service available for both
repository connectors and output connectors to use in massaging
content.  The configuration needed for the service would therefore be
managed by the repository connector or output connector which required
the pipeline's services.


> For the latter, how to pass metadata to engines, that's interesting.  One
> almost universal way is to add metadata tags the header portion of an HTML
> file.  There are some other microformats that some engines understand.
> Could we just assume, for now, that additional meta data will be jammed
> into the HTML header, perhaps with an "x-" for the name (a convention some
> folks like).
>

I would presume that a Java coder who writes the output connector that
knows how to connect to the given search engine would tackle this
problem in the appropriate way.  I don't think it's a pipeline
question.

>
> Including Tika would be useful for connectors that need to look at binary
> doc files to do their parsing.  Even if the pipeline then discards Tika's
> output when it's done, it's still a likely expense *if* it's meets the
> project objective.
>
> As an example, the current MCF system looks for links in HTML.  But
> hyperlinks can also appear in Word, Excel and PDF files.  Tika could, in
> theory, convert those docs so that they cal also be scanned for links, and
> then later discard that converted file.
>

Sure, that's why I'd make the pipeline be available to every
connector.  The Java code for the connector would be modified, if
appropriate, to use the pipeline if it was helpful for it.

>
> Given the dismal state of open tools, I'd be excited to just see 1:1
> "pipeline" functionality be made widely available.
>
> I'm regretting, to some extent, bringing in the more complex Pipeline logic
> as it may have partially derailed the conversation.  I'm one of the authors
> of the old XPump tool, which was able to do very fancy things, but suffered
> from other issues.
>
> But better to have something now then nothing.  And I'll ponder the more
> complex scenarios some more.
>

I'll talk about this more further down.

>
>>
>> So, my question to you is, what would the main use case(s) be for a
>> "pipeline" in your view?
>>
>
> I've given a couple examples above, of 1:1 transforms.  I *KNOW* this is of
> interest to some folks, but it sounds like I've failed to convince you.
> I'd ask you to take it on faith, but you don't know me very well, so that'd
> be asking a lot.
>

The goal of the question was to confirm that you thought the value of
having a "pipeline" was high enough, vs. building a "Pipeline", as
we've defined it.  I wanted to be sure there was no communication
issue and that we understood one another before anybody went off and
started writing code.

>
> A final question for you Karl, since we've both invested some time in
> discussing something that would normally be very complex to others.  What
> open source tools would YOU suggest I look at, for a new home for uber
> pipeline processing?  I think you understand some of the logical
> functionality I want to model.
>
> Some other wish list items:
> * Leverage MCF connectors
> * A web UI framework for monitoring
>
> I'd say up front that I've considered Nutch, but I don't think it's a good
> fit for other reasons.
>
> I'm still looking around at UIMA.  I keep finding the justification for
> UIMA, how awesome it is, but less on the technical side.  I'm not sure it
> models a data flow design that well.
>
> The other area I looked at was some of the Eclipse process graph stuff,
> "Business Process Management" I think.
>
>
> There's a TON of open source projects.
>

I can't claim to speak for knowing all the open-source projects out
there.  But I'm unaware of one that really focuses on "Pipeline"
building from the perspective of crawling.

On the other hand, it seems pretty clear to me how one would go about
converting ManifoldCF to a "Pipeline" project.  What you'd get would
be a tool with UI components where you'd either glue the components
together with code, or use an "amalgamation" UI to generate the
necessary data flow.  There may already be tools in this space I don't
know of, but before you'd get to that point you'd want to have all the
technical underpinnings worked out.

The "Pipeline" services you'd want to provide would include functions
that each connector currently performs, but broken out as I'd
described in one of my earlier posts.  The document queue, which is
managed by the ManifoldCF framework right now, would need to be
redesigned since the entire notion of what a job is would require