Re: Jenkins build failures after git migration

2016-04-21 Thread Mattmann, Chris A (3980)
thanks Seb!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/20/16, 3:09 PM, "Sebastian Nagel"  wrote:

>Thanks, the path to JUnit result files and Javadoc is fixed now.
>Jenkins builds (1.x and 2.x) are back to normal.
>
>Sebastian
>
>On 04/18/2016 05:56 PM, Mattmann, Chris A (3980) wrote:
>> Hey Seb, I’ll also take a look. @Lewis could potentially help here
>> too. Lewis any time to scope?
>> 
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 4/18/16, 4:40 AM, "Sebastian Nagel"  wrote:
>> 
>>> Hi,
>>>
>>> the last successful builds for both branches
>>> https://builds.apache.org/job/Nutch-trunk/
>>> https://builds.apache.org/job/Nutch-nutchgora/
>>> were in February before the svn to git migration.
>>>
>>> The reason is probably a changed path to the build workspace.
>>> When comparing the logs
>>> https://builds.apache.org/job/Nutch-trunk/3356/consoleText
>>> and
>>> https://builds.apache.org/job/Nutch-trunk/3360/consoleText
>>>
>>> (3356, svn)
>>>  Buildfile: 
>>> /home/jenkins/jenkins-slave/workspace/Nutch-trunk/trunk/build.xml
>>>
>>> (3360, git)
>>> Buildfile: /home/jenkins/jenkins-slave/workspace/Nutch-trunk/build.xml
>>>
>>> Although the ant build succeeds, the XML test reports are not found which 
>>> causes
>>> the build to be marked as failed:
>>>
>>> (3360, git)
>>> BUILD SUCCESSFUL
>>> Total time: 12 minutes 37 seconds
>>> [xUnit] [INFO] - Starting to record.
>>> [xUnit] [INFO] - Processing JUnit
>>> [xUnit] [INFO] - [JUnit] - No test report file(s) were found with the 
>>> pattern
>>> 'trunk/build/test/TEST-*.xml' relative to 
>>> '/home/jenkins/jenkins-slave/workspace/Nutch-trunk' for
>>> the testing framework 'JUnit'.  Did you enter a pattern relative to the 
>>> correct directory?  Did you
>>> generate the result report(s) for 'JUnit'?
>>> ...
>>> Finished: FAILURE
>>>
>>>
>>> Does anyone know how to fix this?
>>> I could dig into it later today or tomorrow.
>>>
>>> Thanks,
>>> Sebastian
>


[jira] [Commented] (NUTCH-1785) Ability to index raw content

2016-04-21 Thread Federico Bonelli (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251836#comment-15251836
 ] 

Federico Bonelli commented on NUTCH-1785:
-

I opened a new issue and proposed a patch: NUTCH-2254

> Ability to index raw content
> 
>
> Key: NUTCH-1785
> URL: https://issues.apache.org/jira/browse/NUTCH-1785
> Project: Nutch
>  Issue Type: New Feature
>  Components: indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.11
>
> Attachments: NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, 
> NUTCH-1785-trunk.patch, NUTCH-1785-trunk.patch, NUTCH-1785-trunkv2.patch
>
>
> Some use-cases require Nutch to actually write the raw content a configured 
> indexing back-end. Since Content is never read, a plugin is out of the 
> question and therefore we need to force IndexJob to process Content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2254) Charset issues when using -addBinaryContent and -base64 options

2016-04-21 Thread Federico Bonelli (JIRA)
Federico Bonelli created NUTCH-2254:
---

 Summary: Charset issues when using -addBinaryContent and -base64 
options
 Key: NUTCH-2254
 URL: https://issues.apache.org/jira/browse/NUTCH-2254
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.11
Reporter: Federico Bonelli
Priority: Minor


The bug is reproducible with these steps:

# find a site with cp1252 encoded pages like "http://www.ilsole24ore.com/; and 
characters with accents (byte representation >127, like [àèéìòù])
# start a crawl on that site indexing on Solr with options -addBinaryContent 
-base64
# find a document inside the newly indexed Solr collection with those accented 
characters
# get the base64 binary representation for said html page and decode it back to 
raw binary, save it

The file obtained will have invalid characters, which are neither UTF-8 nor 
cp1252.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-21 Thread Leon Misakyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Misakyan updated NUTCH-2253:
-
Description: 
Hi, as I can see in 1.11 release, ProtocolFactory class still has an issue in 
getProtocol method. This is because every fetcher thread has its own 
ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); in 
FetcherThread constructor.)
So have this method synchronized is useless, because each thread has its own 
monitor.
In our project we have issue of having multiple Protocol instances.
Issue can be fixed if getProtocol method will use shared conf instance as lock 
object or by having one ProtocolFactory for all fetcher threads. 


  was:
Hi, as I can see in 1.11 release ProtocolFactory class still has an issue in 
getProtocol method. This is because every fetcher thread has its own 
ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); in 
FetcherThread constructor.)
So have this method synchronized is useless, because each thread has its own 
monitor.
In our project we have issue of having multiple Protocol instances.
Issue can be fixed if getProtocol method will use shared conf instance as lock 
object or by having one ProtocolFactory for all fetcher threads. 



> ProtocolFactory still not thread-safe
> -
>
> Key: NUTCH-2253
> URL: https://issues.apache.org/jira/browse/NUTCH-2253
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.10, 1.11
>Reporter: Leon Misakyan
>
> Hi, as I can see in 1.11 release, ProtocolFactory class still has an issue in 
> getProtocol method. This is because every fetcher thread has its own 
> ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); 
> in FetcherThread constructor.)
> So have this method synchronized is useless, because each thread has its own 
> monitor.
> In our project we have issue of having multiple Protocol instances.
> Issue can be fixed if getProtocol method will use shared conf instance as 
> lock object or by having one ProtocolFactory for all fetcher threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2253) ProtocolFactory still not thread-safe

2016-04-21 Thread Leon Misakyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Misakyan updated NUTCH-2253:
-
Fix Version/s: (was: 1.8)
   (was: 2.3)

> ProtocolFactory still not thread-safe
> -
>
> Key: NUTCH-2253
> URL: https://issues.apache.org/jira/browse/NUTCH-2253
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.10, 1.11
>Reporter: Leon Misakyan
>
> Hi, as I can see in 1.11 release ProtocolFactory class still has an issue in 
> getProtocol method. This is because every fetcher thread has its own 
> ProtocolFactory instance (this.protocolFactory = new ProtocolFactory(conf); 
> in FetcherThread constructor.)
> So have this method synchronized is useless, because each thread has its own 
> monitor.
> In our project we have issue of having multiple Protocol instances.
> Issue can be fixed if getProtocol method will use shared conf instance as 
> lock object or by having one ProtocolFactory for all fetcher threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)