Hi, I am trying to ingest a large number of files. The metadata for these files exist in .met files.
Many of the metadata fields contain characters like '<>&$' etc. Running crawler on these metadata results in failure. When I try to escape the characters using HTML encode e.g. '>' becomes > etc I still get errors and the crawler cannot ingest the files. Here is an example of the offending lines in the .met file before and after HTML encoding <val>sailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=><:S=AS' -1 <(gunzip -c /gpfs/archive/RED/DA0000072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 <(gunzip -c /gpfs/archive/RED/DA0000072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA0000072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct </val> <val>sailfish quant --index /reference/v1/Homo-sapiens/GRCh37.p12/SailFishIndex --libtype 'T=PE:O=><:S=AS' -1 <(gunzip -c /gpfs/archive/RED/DA0000072/RNA-Seq/RawData/FastqFiles/HP1_3_R1.fastq.gz) -2 <(gunzip -c /gpfs/archive/RED/DA0000072/RNA-Seq/RawData/FastqFiles/HP1_3_R2.fastq.gz) -o /gpfs/archive/RED/DA0000072/RNA-Seq/Processed/Sailfish-transcriptCounts/HP1_3.Sailfish.txt -p 8 --no_bias_correct </val> If I remove the offending characters ( in this case '<>') the ingestion goes one without any issues The crawler command is : ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH --filemgrUrl $OODT_FILEMGR_URL --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --mimeExtractorRe po ../policy/mime-extractor-map.xml --noRecur --crawlForDirs The error message I get when I run the crawler is: INFO: StdIngester: ingesting product: ProductName: [A1_1.Sailfish.sfish]: ProductType: [GenericFile]: FileLocation: [/datavault/RNA-Seq/Processed/Sailfish-transcriptCounts/] org.apache.xmlrpc.XmlRpcException: java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClientResponseProcessor.java:104) at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClientResponseProcessor.java:71) at org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73) at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:194) at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:185) at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:178) at org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient.ingestProduct(XmlRpcFileManagerClient.java:1178) at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:199) at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304) at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75) at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58) at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187) at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36) Oct 07, 2014 11:17:18 PM org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct SEVERE: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request -- rolling back ingest java.lang.Exception: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request at org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient.ingestProduct(XmlRpcFileManagerClient.java:1279) at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:199) at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304) at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75) at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58) at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187) at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36) Oct 07, 2014 11:17:18 PM org.apache.oodt.cas.filemgr.ingest.StdIngester ingest WARNING: exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request Oct 07, 2014 11:17:18 PM org.apache.oodt.cas.crawl.ProductCrawler ingest WARNING: ProductCrawler: Exception ingesting product: [/datavault/RNA-Seq/Processed/Sailfish-transcriptCounts/A1_1.Sailfish.sfish]: Message: exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request: attempting to continue crawling org.apache.oodt.cas.filemgr.structs.exceptions.IngestException: exception ingesting product: [A1_1.Sailfish.sfish]: Message: Failed to ingest product [org.apache.oodt.cas.filemgr.structs.Product@7c1ba98b] : java.lang.Exception: org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error ingesting product [org.apache.oodt.cas.filemgr.structs.Product@5d3f1d87] : HTTP method failed: HTTP/1.1 400 Bad Request at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:204) at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304) at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108) at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75) at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58) at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331) at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187) at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36) Oct 07, 2014 11:17:18 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile WARNING: Failed to ingest product: [/datavault/RNA-Seq/Processed/Sailfish-transcriptCounts/A1_1.Sailfish.sfish]: performing postIngestFail actions Any ideas how I can ingest these files? Thanks K ********************************************************* THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You.