How to process files in a sorted order

2013-11-07 Thread Konstantinos Mavrommatis
Hi,
In my environment I am using cas-crawler to process directories of 1000s of 
files. The metadata for these files are extracted automatically using the 
mimetypes definitions and small wrapper scripts.
In these directories some of the files are derived from other files and 
metadata from the older files need to be transferred to the newer file.
In order to achieve this I need to have the files processed by the cas-crawler 
starting from the older file to the newer file or in other cases in 
alphabetical order..
Any ideas how this can be achieved?

The crawler command I currently use is:
./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH 
--filemgrUrl $FMURL --clientTransferer 
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory  --mimeExtr
actorRepo ../policy/mime-extractor-map.xml

Thanks in advance for your help
Konstantinos

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE. 
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.
* 



Re: How to process files in a sorted order

2013-11-15 Thread Konstantinos Mavrommatis
Thanks Cameron,
I will try to play with it and I will post a solution if I find one.
Konstantinos

On 11/13/13 3:18 PM, "Cameron Goodale"  wrote:

>Konstantinos,
>
>My name is Cameron and I am a committer on the Apache OODT project.  I am
>not familiar with the internals of crawler, but I don't believe there is a
>way to accomplish your goal of enforcing a sorting algorithm within the
>crawler config.  I think you will have to write your own crawler that will
>implement your sorting logic.
>
>
>
>Sincerely,
>
>
>Cameron Goodale
>
>
>On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis <
>kmavromma...@celgene.com> wrote:
>
>> Hi,
>> In my environment I am using cas-crawler to process directories of 1000s
>> of files. The metadata for these files are extracted automatically using
>> the mimetypes definitions and small wrapper scripts.
>> In these directories some of the files are derived from other files and
>> metadata from the older files need to be transferred to the newer file.
>> In order to achieve this I need to have the files processed by the
>> cas-crawler starting from the older file to the newer file or in other
>> cases in alphabetical order..
>> Any ideas how this can be achieved?
>>
>> The crawler command I currently use is:
>> ./crawler_launcher --operation --launchAutoCrawler --productPath
>>$FILEPATH
>> --filemgrUrl $FMURL --clientTransferer
>> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>>  --mimeExtr
>> actorRepo ../policy/mime-extractor-map.xml
>>
>> Thanks in advance for your help
>> Konstantinos
>>
>> *
>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>> CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
>> INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
>> OR INDIVIDUALS NAMED ABOVE.
>> If the reader is not the intended recipient, or the
>> employee or agent responsible to deliver it to the
>> intended recipient, you are hereby notified that any
>> dissemination, distribution or copying of this
>> communication is strictly prohibited. If you have
>> received this communication in error, please reply to the
>> sender to notify us of the error and delete the original
>> message. Thank You.
>> *
>>
>>
>
>
>-- 
>
>Sent from a Tin Can attached to a String

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE. 
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.
* 



File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-08 Thread Konstantinos Mavrommatis
Hi,
I have setup OODT filemanager on port 9000, using SOLR as the indexing service 
on port 8081. They are both setup on the same computer, while crawler runs on a 
number of different compute nodes spread across the local network and the cloud.

When the crawler runs and ingests files I notice that there are several 
connections that open to solr and remain in CLOSE_WAIT state for hours.
any idea why this happens? Moving forward I am planning to use several hundreds 
of crawler instances, each running on different computer, that will create 
thousands of such connections and will probably create problems to the system.
Thanks in advance for any help
Kostas

 $lsof -i :8081
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java92065 kmavrommatis   75u  IPv6 0x392c3fa3b63b29cf  0t0  TCP 
localhost:49205->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   76u  IPv6 0x392c3fa3b6dcbbcf  0t0  TCP 
localhost:49206->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   77u  IPv6 0x392c3fa39fd12e0f  0t0  TCP 
localhost:49207->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   78u  IPv6 0x392c3fa39fdcdbcf  0t0  TCP 
localhost:49208->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   79u  IPv6 0x392c3fa3b62cde0f  0t0  TCP 
localhost:49209->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   80u  IPv6 0x392c3fa39fa2714f  0t0  TCP 
localhost:49210->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   81u  IPv6 0x392c3fa3b6c32acf  0t0  TCP 
localhost:49211->localhost:sunproxyadmin (CLOSE_WAIT)
java92065 kmavrommatis   82u  IPv6 0x392c3fa3b6aa714f  0t0  TCP 
localhost:49212->localhost:sunproxyadmin (CLOSE_WAIT)


process 92065 is:
 /usr/bin/java -Djava.ext.dirs=../lib 
-Djava.util.logging.config.file=../etc/logging.properties 
-Dorg.apache.oodt.cas.filemgr.properties=../etc/filemgr.properties 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManager --portNum 9000

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


RE: File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-09 Thread Konstantinos Mavrommatis
Hi Lewis,
Thank you for your quick response:
Information about the system:
OS is MacOS Mavericks, 
the SOLR version is 4.6.1 running on Tomcat 7.0.50,
SOLR runs on a single server, no replication at all.
OODT version is 0.6

The interesting thing is that after a day of 'moderate' activity I have now
198 CLOSE_WAIT instances on port 8081 (which is where SOLR is listening) with 
corresponding instances in state FIN_WAIT_2 from the client side (i.e. 
filemanager)
I also have another 104 CLOSE_WAIT instances on port 9000, which is where 
filemanager listens to for connections from the compute nodes of the cluster. 
e.g.
java92065 kmavrommatis  363u  IPv6 0x392c3fa39f6adf0f  0t0  TCP 
10.130.0.26:cslistener->cluster:51194 (CLOSE_WAIT)
java92065 kmavrommatis  364u  IPv6 0x392c3fa3b62759cf  0t0  TCP 
10.130.0.26:cslistener->cluster:33123 (CLOSE_WAIT)
java92065 kmavrommatis  365u  IPv6 0x392c3fa3b6d3314f  0t0  TCP 
10.130.0.26:cslistener->cluster:35167 (CLOSE_WAIT)
java92065 kmavrommatis  366u  IPv6 0x392c3fa3b5faff0f  0t0  TCP 
10.130.0.26:cslistener->cluster:46788 (CLOSE_WAIT)
java92065 kmavrommatis  367u  IPv6 0x392c3fa39f6abd0f  0t0  TCP 
10.130.0.26:cslistener->cluster:52658 (CLOSE_WAIT)
java92065 kmavrommatis  368u  IPv6 0x392c3fa3b6d15e0f  0t0  TCP 
10.130.0.26:cslistener->cluster:45768 (CLOSE_WAIT)
java92065 kmavrommatis  369u  IPv6 0x392c3fa3b60fa58f  0t0  TCP 
10.130.0.26:cslistener->cluster:58830 (CLOSE_WAIT)
java92065 kmavrommatis  370u  IPv6 0x392c3fa39fd849cf  0t0  TCP 
10.130.0.26:cslistener->cluster:59112 (CLOSE_WAIT)


I am not sure how to debug this problem, my TCP stack knowledge is limited, any 
ideas?
Thanks
Konstantinos

> -Original Message-
> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
> Sent: Wednesday, July 09, 2014 4:55 AM
> To: dev@oodt.apache.org
> Subject: Re: File manager keeps connections to SOLR in CLOSE_WAIT state
> for hours
> 
> Hi Konstantinos,
> OK, I was ablel to scope this one and I have a few questions for you.
> 1) Which version of Solr are you using? Is it  3.5, 3.6, 4.0-ALPHA? If
> so please scope this issue [0], the solution would be to upgrade if you
> are not too long ahead with ingestion as fixes in Solr are worth having
> based on recent release cycles.
> 2) How many cores do you have on Solr server? Also what kind of setUp
> do you have? Replication at all?
> In recent versions of Solr 4.X SolrJ clients should now call shutdown()
> on their SolrServer object to let it know they don't want to re-use any
> existing connections anymore, and when Solr internally uses SolrJ to
> talk to other nodes in SolrCloud it should be doing this (as of 4.0-
> ALPHA)so this is why I ask.
> Lewis
> 
> [0] https://issues.apache.org/jira/browse/SOLR-3280
> 
> 
> On Tue, Jul 8, 2014 at 7:14 AM, Konstantinos Mavrommatis <
> kmavromma...@celgene.com> wrote:
> 
> > Hi,
> > I have setup OODT filemanager on port 9000, using SOLR as the
> indexing
> > service on port 8081. They are both setup on the same computer, while
> > crawler runs on a number of different compute nodes spread across the
> > local network and the cloud.
> >
> > When the crawler runs and ingests files I notice that there are
> > several connections that open to solr and remain in CLOSE_WAIT state
> for hours.
> > any idea why this happens? Moving forward I am planning to use
> several
> > hundreds of crawler instances, each running on different computer,
> > that will create thousands of such connections and will probably
> > create problems to the system.
> > Thanks in advance for any help
> > Kostas
> >
> >  $lsof -i :8081
> > COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF
> NODE
> > NAME
> > java92065 kmavrommatis   75u  IPv6 0x392c3fa3b63b29cf  0t0
> TCP
> > localhost:49205->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   76u  IPv6 0x392c3fa3b6dcbbcf  0t0
> TCP
> > localhost:49206->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   77u  IPv6 0x392c3fa39fd12e0f  0t0
> TCP
> > localhost:49207->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   78u  IPv6 0x392c3fa39fdcdbcf  0t0
> TCP
> > localhost:49208->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   79u  IPv6 0x392c3fa3b62cde0f  0t0
> TCP
> > localhost:49209->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   80u  IPv6 0x392c3fa39fa2714f  0t0
> TCP
> > localhost:49210->localhost:sunproxyadmin (CLOSE_WAIT)
> > java92065 kmavrommatis   81u  IP

UPDATE-PROBABLY SOLVED: File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-10 Thread Konstantinos Mavrommatis
Hi,

Following the suggestion at

http://blogs.nuxeo.com/development/2013/02/using-httpclient-properly-avoid-closewait-tcp-connections/



I modified the code in

src/main/java/org/apache/oodt/cas/filemgr/catalog/solr/SolrClient.java



and added the line highlighted in red 
(method.setRequestHeader(“Connection”,”close”).

This seems to have resolved the problem at least until now ingestion of few 
thousand files have not produced any stale connection.

Please let me know your thoughts on this solution.

Best,

Konstantinos



==

private String doHttp(HttpMethod method) throws Exception {



StringBuilder response = new StringBuilder();

BufferedReader br = null;

try {



// send request

HttpClient httpClient = new HttpClient();

// 10 July 2014. attempting to avoid problems with 
CLOSE_WAIT connections Based on


//http://blogs.nuxeo.com/development/2013/02/using-httpclient-properly-avoid-closewait-tcp-connections/

method.setRequestHeader("Connection", "close");

int statusCode = httpClient.executeMethod(method);



// read response

if (statusCode != HttpStatus.SC_OK) {



// still consume the response

method.getResponseBodyAsString();

  throw new CatalogException("HTTP method failed: " + 
method.getStatusLine());



} else {



// read the response body.

br = new BufferedReader(new 
InputStreamReader(method.getResponseBodyAsStream()));

String readLine;

while(((readLine = br.readLine()) != null)) {

  response.append(readLine);

}



===



Information about the system:

OS is MacOS Mavericks,

SOLR version : 4.6.1 , SOLR runs on a single server, no replication at all.

Tomcat verision: 7.0.50

OODT version is 0.6

Java(TM) SE Runtime Environment (build 1.7.0_51-b13)







> -Original Message-

> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]

> Sent: Wednesday, July 09, 2014 4:57 AM

> To: dev@oodt.apache.org

> Subject: Re: File manager keeps connections to SOLR in CLOSE_WAIT state

> for hours

>

> It's just occoured to me that everything in

> org.apache.oodt.cas.filemgr.catalog.solr.SolrCatalog (well,

> specifically

> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient) is done through

> HTTPClient so all of the above may not be relevant.

> Can you please scope all the same?

> Thanks

> Lewis

>

>

> On Tue, Jul 8, 2014 at 10:54 PM, Lewis John Mcgibbney <

> lewis.mcgibb...@gmail.com<mailto:lewis.mcgibb...@gmail.com>> wrote:

>

> > Hi Konstantinos,

> > OK, I was ablel to scope this one and I have a few questions for you.

> > 1) Which version of Solr are you using? Is it  3.5, 3.6, 4.0-ALPHA?

> If

> > so please scope this issue [0], the solution would be to upgrade if

> > you are not too long ahead with ingestion as fixes in Solr are worth

> > having based on recent release cycles.

> > 2) How many cores do you have on Solr server? Also what kind of setUp

> > do you have? Replication at all?

> > In recent versions of Solr 4.X SolrJ clients should now call

> > shutdown() on their SolrServer object to let it know they don't want

> > to re-use any existing connections anymore, and when Solr internally

> > uses SolrJ to talk to other nodes in SolrCloud it should be doing

> this

> > (as of 4.0-ALPHA)so this is why I ask.

> > Lewis

> >

> > [0] https://issues.apache.org/jira/browse/SOLR-3280

> >

> >

> > On Tue, Jul 8, 2014 at 7:14 AM, Konstantinos Mavrommatis <

> > kmavromma...@celgene.com<mailto:kmavromma...@celgene.com>> wrote:

> >

> >> Hi,

> >> I have setup OODT filemanager on port 9000, using SOLR as the

> >> indexing service on port 8081. They are both setup on the same

> >> computer, while crawler runs on a number of different compute nodes

> >> spread across the local network and the cloud.

> >>

> >> When the crawler runs and ingests files I notice that there are

> >> several connections that open to solr and remain in CLOSE_WAIT state

> for hours.

> >> any idea why this happens? Moving forward I am planning to use

> >> several hundreds of crawler instances, each running on different

> >> computer, that will create thousands of such connections and will

> >> probably create problems to the system.

> >> Thanks in advance for any hel

RE: UPDATE-PROBABLY SOLVED: File manager keeps connections to SOLR in CLOSE_WAIT state for hours

2014-07-10 Thread Konstantinos Mavrommatis
Unless I am missing something I am not able to create an issue in JIRA :(
Thanks
K

> -Original Message-
> From: Ramirez, Paul M (398J) [mailto:paul.m.rami...@jpl.nasa.gov]
> Sent: Thursday, July 10, 2014 4:06 PM
> To: 
> Subject: Re: UPDATE-PROBABLY SOLVED: File manager keeps connections to
> SOLR in CLOSE_WAIT state for hours
> 
> Konstantinos,
> 
> Could you open a Jira issue for this and attach a patch? This would be
> helpful for the project.
> 
> https://issues.apache.org/jira/browse/OODT/?selectedTab=com.atlassian.j
> ira.jira-projects-plugin:summary-panel
> 
> 
> Thanks,
> Paul Ramirez
> 
> > On Jul 10, 2014, at 5:23 AM, "Konstantinos Mavrommatis"
>  wrote:
> >
> > Hi,
> >
> > Following the suggestion at
> >
> > http://blogs.nuxeo.com/development/2013/02/using-httpclient-properly-
> a
> > void-closewait-tcp-connections/
> >
> >
> >
> > I modified the code in
> >
> >
> src/main/java/org/apache/oodt/cas/filemgr/catalog/solr/SolrClient.java
> >
> >
> >
> > and added the line highlighted in red
> (method.setRequestHeader("Connection","close").
> >
> > This seems to have resolved the problem at least until now ingestion
> of few thousand files have not produced any stale connection.
> >
> > Please let me know your thoughts on this solution.
> >
> > Best,
> >
> > Konstantinos
> >
> >
> >
> > ==
> >
> > private String doHttp(HttpMethod method) throws Exception {
> >
> >
> >
> >StringBuilder response = new StringBuilder();
> >
> >BufferedReader br = null;
> >
> >try {
> >
> >
> >
> >// send request
> >
> >HttpClient httpClient = new HttpClient();
> >
> >// 10 July 2014. attempting to avoid problems
> > with CLOSE_WAIT connections Based on
> >
> >
> > //http://blogs.nuxeo.com/development/2013/02/using-httpclient-
> properly
> > -avoid-closewait-tcp-connections/
> >
> >method.setRequestHeader("Connection",
> "close");
> >
> >int statusCode = httpClient.executeMethod(method);
> >
> >
> >
> >// read response
> >
> >if (statusCode != HttpStatus.SC_OK) {
> >
> >
> >
> >// still consume the response
> >
> >method.getResponseBodyAsString();
> >
> >  throw new CatalogException("HTTP method failed: " +
> > method.getStatusLine());
> >
> >
> >
> >} else {
> >
> >
> >
> >// read the response body.
> >
> >br = new BufferedReader(new
> > InputStreamReader(method.getResponseBodyAsStream()));
> >
> >String readLine;
> >
> >while(((readLine = br.readLine()) != null)) {
> >
> >  response.append(readLine);
> >
> >}
> >
> >
> >
> > ===
> >
> >
> >
> > Information about the system:
> >
> > OS is MacOS Mavericks,
> >
> > SOLR version : 4.6.1 , SOLR runs on a single server, no replication
> at all.
> >
> > Tomcat verision: 7.0.50
> >
> > OODT version is 0.6
> >
> > Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> >
> >
> >
> >
> >
> >
> >
> >> -Original Message-
> >
> >> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
> >
> >> Sent: Wednesday, July 09, 2014 4:57 AM
> >
> >> To: dev@oodt.apache.org
> >
> >> Subject: Re: File manager keeps connections to SOLR in CLOSE_WAIT
> >> state
> >
> >> for hours
> >
> >
> >> It's just occoured to me that everything in
> >
> >> org.apache.oodt.cas.filemgr.catalog.solr.SolrCatalog (well,
> >
> >> specifically
> >
> >> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient) is done through
> >
> >> HTTPClient so all of the above may not be relevant.
> >
> >> Can you please scope all the same?
> >
> >> Thanks
> >
> >> Lewis
> >
> >
> >
> >> On Tue, Jul 8, 2014 at 10:54 PM, Lewis John Mcgibbney <

How to ingest files when metadata contain non standard characters?

2014-10-07 Thread Konstantinos Mavrommatis
Hi,

I am trying to ingest a large number of files. The metadata for these files 
exist in .met files.

Many of the metadata fields contain characters like '<>&$' etc.

Running crawler on these metadata results in failure.

When I try to escape the characters using HTML encode e.g. '>' becomes > etc 
I still get errors and the crawler cannot ingest the files.



Here is an example of the offending lines in the .met file before and after 
HTML encoding

sailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex 
--libtype 'T=PE:O=><:S=AS' -1 <(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 
<(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o 
/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt
 -p 8  --no_bias_correct  





sailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex 
--libtype 'T=PE:O=><:S=AS' -1 <(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 
<(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o 
/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt
 -p 8  --no_bias_correct  



If I remove the offending characters ( in this case '<>') the ingestion goes 
one without any issues



The crawler command is :

./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH 
--filemgrUrl $OODT_FILEMGR_URL --clientTransferer 
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory  
--mimeExtractorRe

po ../policy/mime-extractor-map.xml --noRecur --crawlForDirs



The error message I get when I run the crawler is:
INFO: StdIngester: ingesting product: ProductName: [A1_1.Sailfish.sfish]: 
ProductType: [GenericFile]: FileLocation: 
[/datavault/RNA-Seq/Processed/Sailfish-transcriptCounts/]

org.apache.xmlrpc.XmlRpcException: java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP 
method failed: HTTP/1.1 400 Bad Request

  at 
org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClientResponseProcessor.java:104)

  at 
org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClientResponseProcessor.java:71)

  at 
org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73)

  at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:194)

  at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:185)

  at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:178)

  at 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient.ingestProduct(XmlRpcFileManagerClient.java:1178)

  at 
org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:199)

  at 
org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)

  at 
org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)

  at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)

  at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)

  at 
org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)

  at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)

  at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)

  at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)

Oct 07, 2014 11:17:18 PM 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct

SEVERE: Failed to ingest product 
[org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP 
method failed: HTTP/1.1 400 Bad Request -- rolling back ingest

java.lang.Exception: Failed to ingest product 
[org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP 
method failed: HTTP/1.1 400 Bad Request

  at 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient.ingestProduct(XmlRpcFileManagerClient.java:1279)

  at 
org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:199)

  at 
org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)

  at 
org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)

  at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)

  at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)

  at 
org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)

 

RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Hi Lewis

I escaped the characters using the CGI::escapeHTML function from the CGI perl 
module.

The differences between the two versions (mine escaped vs yours escaped) is in 
the encoding of the single quote "'" character, if I am not mistaken. I want to 
clarify this because your email come as simple ASCII (not HTML)



I did try your command and it worked !!!

Now the question is how to do this encoding (your version) ☺

Thanks

K



> -Original Message-

> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]

> Sent: Wednesday, October 08, 2014 1:43 PM

> To: dev@oodt.apache.org

> Subject: Re: How to ingest files when metadata contain non standard

> characters?

>

> Hi Kos,

> I take you up on your challenge ;) However I don't know if this will

> fix it.

>

> On Tue, Oct 7, 2014 at 11:31 PM, Konstantinos Mavrommatis <

> kmavromma...@celgene.com<mailto:kmavromma...@celgene.com>> wrote:

>

> >

> > sailfish quant --index

> > /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype

> > 'T=PE:O=><:S=AS' -1 <(gunzip -c

> > /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R1.fastq.

> > gz)

> > -2 <(gunzip -c

> > /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R2.fastq.

> > gz)

> > -o

> > /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-

> transcriptCount

> > s/HP1_3.Sailfish.txt

> > -p 8  --no_bias_correct  

> >

>

> OK, the code above is what you intially pasted...

>

>

>

> >

> > sailfish quant --index

> > /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype

> > *'T=PE:O=><:S=AS'* -1 <(gunzip -c

> > /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R1.fastq.

> > gz)

> > -2 <(gunzip -c

> > /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R2.fastq.

> > gz)

> > -o

> > /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-

> transcriptCount

> > s/HP1_3.Sailfish.txt

> > -p 8  --no_bias_correct  

> >

>

>

> The code above is what you pasted once you had escaped everything. Did

> you do this manually? I get a different output which I've pased below

>

>

> 1sailfish quant --index /reference/v1/Homo-

> sapiens/GRCh37.p12/SailFishIndex

> --libtype *'T=PE:O=><:S=AS'* -1 <(gunzip -c

> /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz)

> -2 <(gunzip -c

> /gpfs/archive/RED/DA072/RNA-

> Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz)

> -o

> /gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-

> transcriptCounts/HP1_3.Sailfish.txt

> -p 8  --no_bias_correct

> Please notice the difference in the part which I have boldened. Can you

> try reingesting and see if your come up donald trumps?

>

>

>

> >

> > org.apache.oodt.cas.filemgr.structs.exceptions.IngestException:

> > exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed

> to

> > ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b]

> :

> > java.lang.Exception:

> > org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException:

> Error

> > ingesting product

> > [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87]

> > : HTTP method failed: HTTP/1.1 400 Bad Request

> >

>

> BTW, you also have AGAIN highlighted the horrible opaque Product

> objects we get as Exception output. I logged an issue for this last

> week.

> https://issues.apache.org/jira/browse/OODT-755

> We need to fix this and I will try my damdest to hack it at the

> weekend.

> Thanks

> Lewis

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Thanks Chris,

attached is an offending file before escape.
For the record perl module HTML::Entities does provide an escapeHTML 
alternative that produces acceptable files.

Thanks
K


> -Original Message-
> From: Chris Mattmann [mailto:chris.mattm...@gmail.com]
> Sent: Wednesday, October 08, 2014 11:38 AM
> To: dev@oodt.apache.org
> Subject: Re: How to ingest files when metadata contain non standard
> characters?
> 
> cas-metadata should handle this escaping/unescaping in its SerDe
> capabilities.
> 
> Kostsas, can yo provide the exact file that I can test on and upload it
> to JIRA?
> 
> 
> Chris Mattmann
> chris.mattm...@gmail.com
> 
> 
> 
> 
> -Original Message-
> From: Lewis John Mcgibbney 
> Reply-To: 
> Date: Thursday, October 9, 2014 at 2:59 AM
> To: "dev@oodt.apache.org" 
> Subject: Re: How to ingest files when metadata contain non standard
> characters?
> 
> >Hi Kos,
> >Thanks for reply
> >
> >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis <
> >kmavromma...@celgene.com> wrote:
> >
> >> I escaped the characters using the CGI::escapeHTML function from the
> >> CGI perl module.
> >>
> >
> >Wow. I am surpised at this one. I wonder if this is a bug which
> results
> >in the discrepancy or if this is intential behaviour!
> >
> >
> >>
> >> The differences between the two versions (mine escaped vs yours
> >>escaped)  is in the encoding of the single quote "'" character, if I
> >>am not mistaken.
> >> I want to clarify this because your email come as simple ASCII (not
> >>HTML)
> >>
> >
> >Yes that is correct.
> >
> >
> >>
> >> I did try your command and it worked !!!
> >>
> >
> >OK grand.
> >
> >
> >>
> >> Now the question is how to do this encoding (your version) ☺
> >>
> >>
> >Is this the question? My thoughts would be that this should be
> >encapsulated within OODT somewhere and that it should not be necessary
> >to escape everything as you/we have been doing. This is extremely time
> >consuming and painful.
> >
> >I escaped everything here
> >http://www.freeformatter.com/html-escape.html
> >
> >and compared the strings here
> >http://text-compare.com/
> >
> >The latter resource will verify that it is the single quote that is
> the
> >offending char here.
> >Thanks
> >Lewis
> 


*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


RE: How to ingest files when metadata contain non standard characters?

2014-10-08 Thread Konstantinos Mavrommatis
Here is the offending file before escape:



http://oodt.jpl.nasa.gov/1.0/cas";>

derived_from

/gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex

/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz

/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz


FilePath

/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.sfish


start_execution
Tue Oct  7 20:49:12 2014


ingest_user
kmavrommatis


end_execution
Tue Oct  7 21:03:47 2014


run_user
kmavrommatis


file_host
ussdgsphpccas02


generator
sailfish


run_host
ussdgsphpccmp01


sample_id
2569


generator_version
sailfish[0.6.3]


ProductType
GenericFile


analysis_task
38


generator_string
"sailfish quant --index 
/gpfs/celgene/reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 
'T=PE:O=><:S=AS' -1 <(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R1.fastq.gz) -2 
<(gunzip -c 
/gpfs/archive/RED/DA072/RNA-Seq/RawData/FastqFiles/HM1_1_R2.fastq.gz) -o 
/gpfs/archive/RED/DA072/RNA-Seq/Processed/Sailfish-transcriptCounts/HM1_1.Sailfish.txt
 -p 8  --no_bias_correct "



*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


Problem installing oodt 0.12.

2016-03-01 Thread Konstantinos Mavrommatis
Hi,
I am trying to install the latest oodt 0.12 from the src.zip file.
The installation is on a clean Ubuntu 14.04 with maven2.2.1 and Oracle java JDK 
1.8.0_74
After unzipping the archive I run
mvn clean install
and I get the following error:

$ more org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.txt
---
Test set: org.apache.oodt.cas.protocol.imaps.TestImapsProtocol
---
Tests run: 3, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 2.466 sec <<< 
FAILURE! - in org.apache.oodt.cas.protocol
.imaps.TestImapsProtocol
testLSandGET(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  Time 
elapsed: 0.374 sec  <<< FAILURE!
junit.framework.AssertionFailedError: Failed to connect to GreenMail IMAPS 
server : Failed to connected to IMAPS server
localhost with username bfos...@google.com : 
java.security.cert.CertificateException: Certificates does not conform to a
lgorithm constraints
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.TestCase.fail(TestCase.java:227)
at 
org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsProtocol.java:62)

testCDAndPWD(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  Time 
elapsed: 1.036 sec  <<< ERROR!
java.lang.RuntimeException: Couldnt start at least one of the mail services.
at com.icegreen.greenmail.util.GreenMail.start(GreenMail.java:91)
   at 
org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsProtocol.java:56)

testDelete(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  Time elapsed: 
1.025 sec  <<< ERROR!
java.lang.RuntimeException: Couldnt start at least one of the mail services.
at com.icegreen.greenmail.util.GreenMail.start(GreenMail.java:91)
at 
org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsProtocol.java:56)

is there a dependency on a email server? If so why is that ?

I also tried to install it using Radix and the commands described in 
https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT#RADiXPoweredByOODT-TheCommands.
 In this case after I run
Mvn install
I get the error message

[ERROR] BUILD ERROR
[INFO] 
[INFO] Failed to resolve artifact.

Unable to get dependency information: Unable to read the metadata file for 
artifact 'org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar': Cannot 
find parent: org.apache.geronimo.genesis.config:config for project: 
null:project-config:pom:1.1 for project null:project-config:pom:1.1
  org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1

from the specified remote repositories:
  central (http://repo1.maven.org/maven2),
  sonatype-nexus (https://oss.sonatype.org/content/groups/public),
  maven2 (http://download.java.net/maven/2),
  apache.snapshots (http://repository.apache.org/snapshots/)

Path to dependency:
1) com.mycompany:oodt-extensions:jar:0.1
2) org.apache.oodt:cas-filemgr:jar:0.12
3) org.apache.solr:solr-core:jar:1.3.0


Am I doing something wrong ?
Thanks in advance for your help.

Kostas


*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


RE: Problem installing oodt 0.12.

2016-03-02 Thread Konstantinos Mavrommatis
Thanks,
I tried the -DskipTests but now I got stuck at another point:

[ERROR] BUILD ERROR
[INFO] 
[INFO] Failed to resolve artifact.

Unable to get dependency information: Unable to read the metadata file for 
artifact 'org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar': Cannot 
find parent: org.apache.geronimo.genesis.config:config for project: 
null:project-config:pom:1.1 for project null:project-config:pom:1.1
  org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1

from the specified remote repositories:
  central (http://repo1.maven.org/maven2),
  sonatype-nexus (https://oss.sonatype.org/content/groups/public),
  apache.snapshots (http://repository.apache.org/snapshots)

Path to dependency:
1) org.apache.oodt:cas-filemgr:jar:0.12
2) org.apache.solr:solr-core:jar:1.3.0






-Original Message-
From: Tom Barber [mailto:tom.bar...@meteorite.bi] 
Sent: Tuesday, March 01, 2016 11:59 PM
To: dev@oodt.apache.org
Subject: Re: Problem installing oodt 0.12.

Also FYI I'd suggest maven 3.x these days, I don't think it will make a 
difference in your case but at least it's relatively up to date.

Tom
On 2 Mar 2016 07:54, "Tom Barber"  wrote:

> Hi Kostas
>
> There is a dependency but the tests should mock it, a while ago I 
> fixed a lot of that stuff where the tests failed to mock services and 
> instead bound to ports and things because from time to time it does cause 
> issues.
>
> For now I'd suggest you run -DskipTests as they've been run and passed 
> prior to release. Not ideal but will unblock you.
>
> I'll try and get that test resolved properly, soon.
>
> Cheers,
>
> Tom
> On 2 Mar 2016 06:52, "Konstantinos Mavrommatis" 
> 
> wrote:
>
>> Hi,
>> I am trying to install the latest oodt 0.12 from the src.zip file.
>> The installation is on a clean Ubuntu 14.04 with maven2.2.1 and 
>> Oracle java JDK 1.8.0_74 After unzipping the archive I run mvn clean 
>> install and I get the following error:
>>
>> $ more org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.txt
>>
>> -
>> -- Test set: 
>> org.apache.oodt.cas.protocol.imaps.TestImapsProtocol
>>
>> -
>> -- Tests run: 3, Failures: 1, Errors: 2, Skipped: 0, Time 
>> elapsed: 2.466 sec <<< FAILURE! - in org.apache.oodt.cas.protocol 
>> .imaps.TestImapsProtocol
>> testLSandGET(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  
>> Time
>> elapsed: 0.374 sec  <<< FAILURE!
>> junit.framework.AssertionFailedError: Failed to connect to GreenMail 
>> IMAPS server : Failed to connected to IMAPS server localhost with 
>> username bfos...@google.com :
>> java.security.cert.CertificateException: Certificates does not 
>> conform to a lgorithm constraints
>> at junit.framework.Assert.fail(Assert.java:57)
>> at junit.framework.TestCase.fail(TestCase.java:227)
>> at
>> org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsP
>> rotocol.java:62)
>>
>> testCDAndPWD(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  
>> Time
>> elapsed: 1.036 sec  <<< ERROR!
>> java.lang.RuntimeException: Couldnt start at least one of the mail 
>> services.
>> at com.icegreen.greenmail.util.GreenMail.start(GreenMail.java:91)
>>at
>> org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsP
>> rotocol.java:56)
>>
>> testDelete(org.apache.oodt.cas.protocol.imaps.TestImapsProtocol)  
>> Time
>> elapsed: 1.025 sec  <<< ERROR!
>> java.lang.RuntimeException: Couldnt start at least one of the mail 
>> services.
>> at com.icegreen.greenmail.util.GreenMail.start(GreenMail.java:91)
>> at
>> org.apache.oodt.cas.protocol.imaps.TestImapsProtocol.setUp(TestImapsP
>> rotocol.java:56)
>>
>> is there a dependency on a email server? If so why is that ?
>>
>> I also tried to install it using Radix and the commands described in 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_RADiX-2BPowered-2BBy-2BOODT-23RADiXPoweredByOODT-2DTheCommands&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=7zyWU1FH-fPuthdUVQiHJPWU7Dzdx72sdoSROsD7jAg&s=7c322fhJX9BbXH_E2tuD6_1XTUpRbOq3IOdbK0rhZJQ&e=
>>  .
>> In this case after I run
>> Mvn install
>> I get the error message
>>
>&

RE: Problem installing oodt 0.12.

2016-03-02 Thread Konstantinos Mavrommatis
org.apache.wicket.request.AbstractRequestCycleProcessor.processEvents(AbstractRequestCycleProcessor.java:92)
 at 
org.apache.wicket.RequestCycle.processEventsAndRespond(RequestCycle.java:1250)
 at org.apache.wicket.RequestCycle.step(RequestCycle.java:1329)
 at org.apache.wicket.RequestCycle.steps(RequestCycle.java:1436)
 at org.apache.wicket.RequestCycle.request(RequestCycle.java:545)
 at 
org.apache.wicket.protocol.http.WicketFilter.doGet(WicketFilter.java:486)

java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at 
org.apache.wicket.session.DefaultPageFactory.createPage(DefaultPageFactory.java:188)
 at 
org.apache.wicket.session.DefaultPageFactory.newPage(DefaultPageFactory.java:65)
 at 
org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.newPage(BookmarkablePageRequestTarget.java:298)
 at 
org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.getPage(BookmarkablePageRequestTarget.java:320)
 at 
org.apache.wicket.request.target.component.BookmarkablePageRequestTarget.processEvents(BookmarkablePageRequestTarget.java:234)
 at 
org.apache.wicket.request.AbstractRequestCycleProcessor.processEvents(AbstractRequestCycleProcessor.java:92)
 at 
org.apache.wicket.RequestCycle.processEventsAndRespond(RequestCycle.java:1250)
 at org.apache.wicket.RequestCycle.step(RequestCycle.java:1329)
 at org.apache.wicket.RequestCycle.steps(RequestCycle.java:1436)
 at org.apache.wicket.RequestCycle.request(RequestCycle.java:545)
 at 
org.apache.wicket.protocol.http.WicketFilter.doGet(WicketFilter.java:486)


-Original Message-
From: Tom Barber [mailto:tom.bar...@meteorite.bi] 
Sent: Wednesday, March 02, 2016 1:33 AM
To: dev@oodt.apache.org
Subject: Re: Problem installing oodt 0.12.

Okay Kostas

You should certainly use Radix unless you have good reason not to, it's by far 
the sanest way to build a project currently.

I use containers to check this stuff so maven can't cheat an use a cached 
dependency etc it also isolates my tests from any other rubbish I have running 
on my laptop. So I've created a clean ubuntu trusty environment and here is my 
entire bash_history:

apt-get get update
apt-get update
apt-get install openjdk7-jdk
apt-cache search openjdk
apt-get install openjdk-7-jdk
apt-get install maven
mvn -v
curl -s "
https://urldefense.proofpoint.com/v2/url?u=https-3A__git-2Dwip-2Dus.apache.org_repos_asf-3Fp-3Doodt.git-3Ba-3Dblob-5Fplain-3Bf-3Dmvn_archetypes_radix_src_main_resources_bin_radix-3Bhb-3DHEAD&d=CwIFaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=Hxhb-K_1RTAMyy7OPQNBkP6KjbMlImt8JVSUeGy4Zmw&s=WIGMOxijYw_KyYYAMtZV_4ZCfXXOwMkClITLEe18iME&e=
 "
| bash
mv oodt oodt-src; cd oodt-src; mvn install cd ../oodt; ./bin/oodt start ps aux 
|grep oodt

At the end of that I had a working OODT.

I then dumped Oracle JDK8 on the server to test that and built OODT and it ran 
fine.

I'm not entirely sure whats causing that geronimo error, I do know that 
codehaus shutting down caused a few issues for a lot of projects. I would 
recommend updating to Maven 3, and clearing your ~/.m2/repository folder to 
make sure you don't have any half downloaded or stale dependencies.

From what I can see the 0.12 Radix build works fine as I copied the wiki page 
verbatim. Let me know how it goes!

Tom

On Wed, Mar 2, 2016 at 8:24 AM, Tom Barber  wrote:

> Not tried to build 0.12 yet, but I can do in 30 mins when I get to my 
> desk. Give maven 3 a whirl and see if you get any further.
>
> Tom
> On 2 Mar 2016 08:02, "Konstantinos Mavrommatis" 
> 
> wrote:
>
>> Thanks,
>> I tried the -DskipTests but now I got stuck at another point:
>>
>> [ERROR] BUILD ERROR
>> [INFO]
>> -
>> ---
>> [INFO] Failed to resolve artifact.
>>
>> Unable to get dependency information: Unable to read the metadata 
>> file for artifact 'org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar':
>> Cannot find parent: org.apache.geronimo.genesis.config:config for project:
>> null:project-config:pom:1.1 for project null:project-config:pom:1.1
>>   org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1
>>
>> from the specified remote repositories:
>>   central 
>> (https://urldefense.proofpoint.com/v2/url?u=http-3A__repo1.maven.org_maven2&d=CwIFaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=Hxhb-K_1RTAMyy7OPQNBkP6KjbMlImt8JVSUeGy4Zmw&

Setting up filemanager with SOLR 5.5

2016-03-23 Thread Konstantinos Mavrommatis
Hi,
I have setup oodt using RADiX. 

When I use the default Lucene catalog factory I manage to ingest a file with no 
problem:
# ./filemgr-client --url http://localhost:9000 --operation --ingestProduct 
--productName test.txt --productStructure Flat --productTypeName GenericFile 
--metadataFile file:///tmp/test.txt.met --refs file:///tmp/test.txt
ingestProduct: Result: afda62a1-f15f-11e5-a7d4-7d8cde2ab6dd


Then I install solr 5.5/jetty and created a new core named oodt
#solr create_core -c oodt
I run the example command in solr documentation and verified that this instance 
is able to index documents. I can also access solr dashboard at the following 
URL: http://localhost:8983/solr 
I also modified the filemgr.properties file with the following lines:
filemgr.catalog.factory=org.apache.oodt.cas.filemgr.catalog.solr.SolrCatalogFactory
org.apache.oodt.cas.filemgr.catalog.solr.url=http://localhost:8983/solr

I have tried to run this command with and without the schema.xml in the core 
directory. In both cases I get the error:

But now when I try to ingest a file I get:
# ./filemgr-client --url http://localhost:9000 --operation --ingestProduct 
--productName test.txt --productStructure Flat --productTypeName GenericFile 
--metadataFile file:///tmp/test.txt.met --refs file:///tmp/test.txt
ERROR: Failed to ingest product 'test.txt' : java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5eebbe] : HTTP 
method failed: HTTP/1.1 404 Not Found

Any ideas what is going on?

Thanks
K

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.


RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
Hi,
I am using oodt v 0.12
Interestingly the file etc/logging.properties was not in its place although 
this was a clean installation on a clean, newly started Ubuntu 14.04 server on 
AWS (!!) I copied the file from the 
oodt-src/filemgr/target/classes/etc/logging.properties and set the logging 
level.
Running the same command:
At the end of the file there are following error lines

INFO: Posting message:93addb3e-4d02-41ca-bee6-fbb321d6c89093addb3e-4d02-41ca-bee6-fbb321d6c890test.txtGenericFile2016-03-24T07:12:01Zurn:oodt:GenericFileFlatTRANSFERING to 
URL:http://localhost:8983/solr/update
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.catalog.solr.SolrClient 
index
SEVERE: HTTP method failed: HTTP/1.1 404 Not Found
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
catalogProduct
SEVERE: ingestProduct: CatalogException when adding Product: test.txt to 
Catalog: Message: HTTP method failed: HTTP/1.1 404 Not Found
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
ingestProductCore
SEVERE: HTTP method failed: HTTP/1.1 404 Not Found
Mar 24, 2016 7:12:01 AM 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct
SEVERE: Failed to ingest product [ name:test.txt] :java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@60a79fbb] : HTTP 
method failed: HTTP/1.1 404 Not Found -- rolling back ingest
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.catalog.solr.SolrClient 
delete
INFO: Posting message:id:null to 
URL:http://localhost:8983/solr/update?commit=true
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.catalog.solr.SolrClient 
delete
SEVERE: HTTP method failed: HTTP/1.1 404 Not Found
Mar 24, 2016 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
removeProduct
WARNING: Exception modifying product: [null]: Message: HTTP method failed: 
HTTP/1.1 404 Not Found
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: HTTP method 
failed: HTTP/1.1 404 Not Found
at 
org.apache.oodt.cas.filemgr.catalog.solr.SolrClient.delete(SolrClient.java:130)
at 
org.apache.oodt.cas.filemgr.catalog.solr.SolrCatalog.removeProduct(SolrCatalog.java:165)
at 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManager.removeProduct(XmlRpcFileManager.java:1113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.xmlrpc.Invoker.execute(Invoker.java:130)
at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:84)
at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:146)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:125)
at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:761)
at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:642)
at java.lang.Thread.run(Thread.java:745)

Mar 24, 2016 7:12:01 AM 
org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct
SEVERE: Failed to rollback ingest of product 
[org.apache.oodt.cas.filemgr.structs.Product@2782f72b] : java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@60a79fbb] : HTTP 
method failed: HTTP/1.1 404 Not Found
Mar 24, 2016 7:12:01 AM 
org.apache.oodt.cas.filemgr.cli.action.IngestProductCliAction execute
SEVERE: java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@60a79fbb] : HTTP 
method failed: HTTP/1.1 404 Not Found
ERROR: Failed to ingest product 'test.txt' : java.lang.Exception: 
org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error 
ingesting product [org.apache.oodt.cas.filemgr.structs.Product@60a79fbb] : HTTP 
method failed: HTTP/1.1 404 Not Found-Original Message-



From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
Sent: Wednesday, March 23, 2016 7:25 PM
To: dev@oodt.apache.org
Subject: Re: Setting up filemanager with SOLR 5.5

Hi K,
Which version of OODT are you using?
Can you set logging to debug, restart your filemanager instance and tail the 
log for any more clues. If you get any more clues then post them here.
Also, if you are running off of master branch then this is an excellent 
opportunity for us to improve the error message, printing the product ID as 
oppose the opaque object... the later is relatively useless.



On Wed, Mar 23, 2016 at 6:48 PM, Konstantinos Mavrommatis < 
kmavromma...@celgene.com> wrote:

> Hi,
> I

RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
clipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
===

To resolve it I modified the managed-schema file 
replaced the line 
 line with the following one

Now the ingestion process works without problem.
Since I am very new to SOLR, this may not be the correct approach but for the 
time works - if somebody has a better idea please chime in.
Thanks
K

-Original Message-
From: Tom Barber [mailto:tom.bar...@meteorite.bi] 
Sent: Thursday, March 24, 2016 12:37 AM
To: dev@oodt.apache.org
Subject: Re: Setting up filemanager with SOLR 5.5

Ooh, I see, 5.5! Sorry. not had my morning coffee yet. Yeah I tried 5.5 a while 
back out of curiosity rather than a use case and it certainly didn't work OOTB.

Tom

On Thu, Mar 24, 2016 at 7:32 AM, Tom Barber  wrote:

> I appreciate its cheating, but when you build OODT with Radix did you 
> use the Solr maven profile? We fixed it up in 0.12 and it seemed to be 
> working fine, although I wasn't the one who tested it fully.
>
> Tom
>
> On Thu, Mar 24, 2016 at 7:15 AM, Konstantinos Mavrommatis < 
> kmavromma...@celgene.com> wrote:
>
>> Hi,
>> I am using oodt v 0.12
>> Interestingly the file etc/logging.properties was not in its place 
>> although this was a clean installation on a clean, newly started 
>> Ubuntu
>> 14.04 server on AWS (!!) I copied the file from the 
>> oodt-src/filemgr/target/classes/etc/logging.properties and set the 
>> logging level.
>> Running the same command:
>> At the end of the file there are following error lines
>>
>> INFO: Posting message:> name="id">93addb3e-4d02-41ca-bee6-fbb321d6c890> name="CAS.ProductId">93addb3e-4d02-41ca-bee6-fbb321d6c890> ld name="CAS.ProductName">test.txt> name="CAS.ProductTypeName">GenericFile> name="CAS.ProductReceivedTime">2016-03-24T07:12:01Z> name="CAS.ProductTypeId">urn:oodt:GenericFile> name="CAS.ProductStructure">Flat> name="CAS.ProductTransferStatus">TRANSFERING to URL:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_
>> solr_update&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2h
>> q2O6yvZ1Cs-T2gHY95y7ZA&m=RHpDT3nSOn4uWf0ifOCDn1AIwRXSw8CIY51DbEp5Lc8&
>> s=7cz1N0iI0WRMpROKkDmYSpLacfEzT9fKjg4N8Oh36w0&e=
>> Mar 24, 2016 7:12:01 AM
>> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient index
>> SEVERE: HTTP method failed: HTTP/1.1 404 Not Found Mar 24, 2016 
>> 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
>> catalogProduct
>> SEVERE: ingestProduct: CatalogException when adding Product: test.txt 
>> to
>> Catalog: Message: HTTP method failed: HTTP/1.1 404 Not Found Mar 24, 
>> 2016 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
>> ingestProductCore
>> SEVERE: HTTP method failed: HTTP/1.1 404 Not Found Mar 24, 2016 
>> 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient 
>> ingestProduct
>> SEVERE: Failed to ingest product [ name:test.txt] :java.lang.Exception:
>> org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: 
>> Error ingesting product 
>> [org.apache.oodt.cas.filemgr.structs.Product@60a79fbb]
>> : HTTP method failed: HTTP/1.1 404 Not Found -- rolling back ingest 
>> Mar 24, 2016 7:12:01 AM 
>> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient delete
>> INFO: Posting message:id:null to URL:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_
>> solr_update-3Fcommit-3Dtrue&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ
>> 4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=RHpDT3nSOn4uWf0ifOCDn1AIwRXS
>> w8CIY51DbEp5Lc8&s=jITvZPImZDY7_yKA18cyMffYS67ahR9Wz9IN9P1UNz0&e=
>> Mar 24, 2016 7:12:01 AM
>> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient delete
>> SEVERE: HTTP method failed: HTTP/1.1 404 Not Found Mar 24, 2016 
>> 7:12:01 AM org.apache.oodt.cas.filemgr.system.XmlRpcFileManager 
>> removeProduct
>> WARNING: Exception modifying product: [null]: Message: HTTP method
>> failed: HTTP/1.1 404 Not Found
>> org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: HTTP 
>> method failed: HTTP/1.1 404 Not Found
>> at
>> org.apache.oodt.cas.filemgr.catalog.solr.SolrClient.delete(SolrClient.java:130)
>> at
>> org.apache.oodt.cas.filemgr.catal

RE: Setting up filemanager with SOLR 5.5

2016-03-24 Thread Konstantinos Mavrommatis
I created an issue and copied the relevant lines from my managed-schema file in 
the comments section of the issue
Let me know if this works.
Thanks
K

-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
Sent: Thursday, March 24, 2016 6:07 PM
To: dev@oodt.apache.org
Subject: Re: Setting up filemanager with SOLR 5.5

Hi Kos,
The schema field definition in the file manager schema sound like they need a 
bit of an overhaul.
Are you able to file an issue against master and submit a PR?
Thanks
Lewis

On Thursday, March 24, 2016, Konstantinos Mavrommatis < 
kmavromma...@celgene.com> wrote:

> Hi,
>
> I seem to have solved the issue, although I have not tested the setup 
> extensively.
>
> 1. the actual url to solr needs to contain the core name. So the 
> correct url that needs to be in the filemgr.properties file is "
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_s
> olr_oodt_&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=K-d7kCTHiATrIXnEOzkoLsBJQVSARmtX3WF1LeLp3S0&s=g3WzUToRlZMgmdcDjfEZbTDgwaUyAGW8el9v-bTNeGE&e=
>  " where the last part 'oodt' corresponds to the name of the core.
> 2. The schema.xml provided with oodt is not compatible to the classes 
> of this version of SOLR. Instead I setup the solr core using the 
> data_driven_schema_configs method, and copied the definitions of the 
> CAS.* fields in the file managed-schema.
>
> As a result I could now ingest files in SOLR but only partially. The 
> first action of adding information about a document (i.e. CAS.* 
> fields) was successful.
> But the second action of updating the record with additional 
> information failed with errors of the type: HTTP method failed: 
> HTTP/1.1 400 Bad Request
>
> In the solr.log file the error message indicated that it cannot index 
> the field 'FileLocation'. Note that even simple curl commands that 
> were trying to update documents by adding a single field at a time  
> were failing with the same type of error. Somehow SOLR seemed 
> incapable of adding new fields that were not explicitly defined in the 
> managed-schema file.
>
> =
> 2016-03-24 23:57:53.990 ERROR (qtp364656927-14) [   x:oodt]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: 
> undefined
> field: "FileLocation"
> at
> org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:1209)
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.doAdd(AtomicUpdateDocumentMerger.java:133)
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.merge(AtomicUpdateDocumentMerger.java:89)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:1121)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1018)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
> at
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:94)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.Scoped

Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-02 Thread Konstantinos Mavrommatis
Hi,
I am trying to replicate a fully functional service that I had setup long time 
ago using OODT 0.6 but I am having the following problem that does not allow me 
to ingest files. When I try to ingest files with the extension fastq.gz I get 
the line:
WARNING: No extractor specs specified for 
/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
And of course the file is not ingested. This process works without problem with 
OODT 0.6 on a different server.

The crawler command I am running is:
./crawler_launcher \
--operation \
--launchAutoCrawler \
--productPath $FILEPATH \
--filemgrUrl $OODT_FILEMGR_URL \
--clientTransferer 
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \
--mimeExtractorRepo ../policy/mime-extractor-map.xml \
--noRecur \
--crawlForDirs 2>&1



I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>From a client machine I have verified that I can use FM to ingest products.
I am now trying to use crawler to crawl and ingest all files in a directory. 
Since I have non standard MIME types in these directories I have done the 
following:
1. Added my own mime types in policy/mimetypes.xml eg
  











2. created the file policy/mime-extractor-map.xml










3. created the file fastq.config

http://oodt.jpl.nasa.gov/1.0/cas";>
  

/apache-oodt/crawler/bin/MetExtractorNGS.pl
  
 
fastq
  
   




The MetExtractorNGS.pl is a small perl script that opens the file to be 
ingested, gets some information and stores it in the .met file that corresponds 
to the file to be ingested and have manually verified that works as expected 
producing the correct met file.

What am I missing here? Any ideas comments suggestions will be greatly 
appreciated.
Thanks in advance for any help
Kostas



PS1 The full output from running the crawler command follows:


Setting property 'StdProductCrawler.filemgrUrl'
Setting property 'MetExtractorProductCrawler.filemgrUrl'
Setting property 'AutoDetectProductCrawler.filemgrUrl'
Setting property 'StdProductCrawler.clientTransferer'
Setting property 'MetExtractorProductCrawler.clientTransferer'
Setting property 'AutoDetectProductCrawler.clientTransferer'
Setting property 'StdProductCrawler.noRecur'
Setting property 'MetExtractorProductCrawler.noRecur'
Setting property 'AutoDetectProductCrawler.noRecur'
Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
Setting property 'StdProductCrawler.productPath'
Setting property 'MetExtractorProductCrawler.productPath'
Setting property 'AutoDetectProductCrawler.productPath'
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.productPath' set to value 
[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value 
[../policy/mime-extractor-map.xml]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to value 
[org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value 
[http://192.168.8.44:9000]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to value 
[org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.filemgrUrl' set to value 
[http://192.168.8.44:9000]
Apr 02, 2016 10:12:13 PM 
org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'Aut

RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-04 Thread Konstantinos Mavrommatis
Hi,
It seems to be happening for a number of types of files that I have in the 
mimetypes.xml. 
A few things are puzzling to me: this file which is a .gz file is not processed 
by the regular tika mimetypes which contains the gzip files
A file that has no extension, which defaults to txt is passed to the 
MetExtractor.pl and processed. 

Any ideas I can find what are the preconditions that fail ? I tried to change 
the log level to DEBUG for all components but I did not get much more 
information. This must be something that changed in the OODT releases >0.6 but 
could not find anything relevant in the release notes.
I also noticed in the documentation  of the AutoDecectProductCrawler that it 
uses the file met-extr-preconditions.xml which I could not find anywhere in the 
deployed OODT or the src directories. Could that be a reason for the problem I 
observe?

Thanks
K

-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
Sent: Monday, April 04, 2016 3:24 PM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor 
specifications

Hi Konstantinos,
It appears to be happening with a tar.gz file as well right?

WARNING: No extractor specs specified for 
/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz

I wonder if it is the file names... However I would be extremely surprised as 
I've seen some much more verbose file naming.
Lewis

On Saturday, April 2, 2016, Konstantinos Mavrommatis < 
kmavromma...@celgene.com> wrote:

> Hi,
> I am trying to replicate a fully functional service that I had setup 
> long time ago using OODT 0.6 but I am having the following problem 
> that does not allow me to ingest files. When I try to ingest files 
> with the extension fastq.gz I get the line:
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> And of course the file is not ingested. This process works without 
> problem with OODT 0.6 on a different server.
>
> The crawler command I am running is:
> ./crawler_launcher \
> --operation \
> --launchAutoCrawler \
> --productPath $FILEPATH \
> --filemgrUrl $OODT_FILEMGR_URL \
> --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \ 
> --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ 
> --crawlForDirs 2>&1
>
>
>
> I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> From a client machine I have verified that I can use FM to ingest products.
> I am now trying to use crawler to crawl and ingest all files in a 
> directory. Since I have non standard MIME types in these directories I 
> have done the following:
> 1. Added my own mime types in policy/mimetypes.xml eg
>   
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2. created the file policy/mime-extractor-map.xml
>
> 
>  class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>  file="/apache-oodt/crawler/bin/fastq.config"/>
> 
>  id="CheckThatDataFileSizeIsGreaterThanZero"/>
> 
> 
> 
>
> 3. created the file fastq.config
>   xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas";>
>   
>
> /apache-oodt/crawler/bin/MetExtractorNGS.pl
>   
>  
> fastq
>   
>
> 
>
>
>
> The MetExtractorNGS.pl is a small perl script that opens the file to 
> be ingested, gets some information and stores it in the .met file that 
> corresponds to the file to be ingested and have manually verified that 
> works as expected producing the correct met file.
>
> What am I missing here? Any ideas comments suggestions will be greatly 
> appreciated.
> Thanks in advance for any help
> Kostas
>
>
>
> PS1 The full output from running the crawler command follows:
>
>
> Setting property 'StdProductCrawler.filemgrUrl'
> Setting property 'MetExtractorProductCrawler.filemgrUrl'
> Setting property 'AutoDetectProductCrawler.filemgrUrl'
> Setting property 'StdProductCrawler.clientTransferer'
> Setting property 'MetExtractorProductCrawler.clientTransferer'
> Setting property 'AutoDetectProductCrawler.clientTransferer'
> Setting 

RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

2016-04-06 Thread Konstantinos Mavrommatis
I am giving up on this
I had used [1] in the first place to setup oodt (v0.6 back then) my setup in 
the new system is identical to the old one.
I could not make much out of [0]. Among other things I tried to copy the files 
in the old crawler/policy to the new crawler/policy - which included some 
legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried to 
reinstall the full oodt on the client side, but still did not work. 

I ended up reverting to the older version (0.6) which I run on my client. The 
server (which runs FM) is still 0.12, but the combination seems to be working 
fine.

K

-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
Sent: Tuesday, April 05, 2016 3:33 AM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor 
specifications

Hi K,
OK so I did a bit of searching here and located a bunch of files which are 
defined as legacy... you can check the search results out below 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
I would urge you to have a look at the AutoDetectProductCrawler Javadoc 
description included in master branch [0] as well to see if you've got 
everything required.
Finally, I came across some documentation on the wiki which may guide you in 
the right direction [1]. It may also be outdated though so please let us know 
if that it the case.
hth

[0]
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e=
 

On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < 
kmavromma...@celgene.com> wrote:

> Hi,
> It seems to be happening for a number of types of files that I have in 
> the mimetypes.xml.
> A few things are puzzling to me: this file which is a .gz file is not 
> processed by the regular tika mimetypes which contains the gzip files 
> A file that has no extension, which defaults to txt is passed to the 
> MetExtractor.pl and processed.
>
> Any ideas I can find what are the preconditions that fail ? I tried to 
> change the log level to DEBUG for all components but I did not get 
> much more information. This must be something that changed in the OODT 
> releases
> >0.6 but could not find anything relevant in the release notes.
> I also noticed in the documentation  of the AutoDecectProductCrawler 
> that it uses the file met-extr-preconditions.xml which I could not 
> find anywhere in the deployed OODT or the src directories. Could that 
> be a reason for the problem I observe?
>
> Thanks
> K
>
> -Original Message-
> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
> Sent: Monday, April 04, 2016 3:24 PM
> To: dev@oodt.apache.org
> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor 
> specifications
>
> Hi Konstantinos,
> It appears to be happening with a tar.gz file as well right?
>
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.log.gz
>
> I wonder if it is the file names... However I would be extremely 
> surprised as I've seen some much more verbose file naming.
> Lewis
>
> On Saturday, April 2, 2016, Konstantinos Mavrommatis < 
> kmavromma...@celgene.com> wrote:
>
> > Hi,
> > I am trying to replicate a fully functional service that I had setup 
> > long time ago using OODT 0.6 but I am having the following problem 
> > that does not allow me to ingest files. When I try to ingest files 
> > with the extension fastq.gz I get the line:
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > And of course the file is not i