[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution
[ https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920482#action_12920482 ] Karl Wright commented on CONNECTORS-116: I just want to point out that Apache Legal is not the only stakeholder here. The community also may choose to take action regardless of the results of Apache Legal's review. Obviously any such action should be done in accordance with procedures laid out by Apache Legal, however. Possibly remove memex connector depending upon legal resolution --- Key: CONNECTORS-116 URL: https://issues.apache.org/jira/browse/CONNECTORS-116 Project: ManifoldCF Issue Type: Task Components: Memex connector Reporter: Robert Muir Assignee: Robert Muir Apparently there is an IP problem with the memex connector code. Depending upon what apache legal says, we will take any action under this issue publicly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CONNECTORS-115) Restarting the example fails when db present
[ https://issues.apache.org/jira/browse/CONNECTORS-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-115. Resolution: Fixed Fix Version/s: LCF Release 0.5 Assignee: Karl Wright I found the fix that corresponds to this report. r1006085 | kwright | 2010-10-08 20:43:45 -0400 (Fri, 08 Oct 2010) | 1 line Fix problem with postgreSQL implementation which causes second run of DBCreate to fail. Restarting the example fails when db present Key: CONNECTORS-115 URL: https://issues.apache.org/jira/browse/CONNECTORS-115 Project: ManifoldCF Issue Type: Bug Environment: Windows XP, Example running with PostgreSQL instead of embedded derby. Use defaults for dbname, user, and password. Reporter: Farzad Assignee: Karl Wright Fix For: LCF Release 0.5 When you restart the example you get the following: C:\Program Files\Apache\apache-acf\examplejava -jar start.jar Configuration file successfully read org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: Exception doing query: ERROR: database d bname already exists at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:421) at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:465) at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1072) at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144) at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:167) at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.createUserAndDatabase(DBInterfacePostgreSQL.java:50 8) at org.apache.manifoldcf.core.system.ManifoldCF.createSystemDatabase(ManifoldCF.java:638) at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:202) Caused by: org.postgresql.util.PSQLException: ERROR: database dbname already exists at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1548) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1316) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:337) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:329) at org.apache.manifoldcf.core.database.Database.execute(Database.java:526) at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:381) C:\Program Files\Apache\apache-acf\example The only way to get it started is dropping the table it created the first time, in this case dbname. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-117) Database-specific maintenance activities such as reindexing should have their frequency be under the control of the database driver
[ https://issues.apache.org/jira/browse/CONNECTORS-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920489#action_12920489 ] Karl Wright commented on CONNECTORS-117: There are two kinds of activities that matter: ANALYZE, and REINDEX, each on a respective table. The tracking of the number of modifies, inserts, and deletes should remain the responsibility of the table manager itself, but a notification method for each activity should be implemented. The database driver (only PostgreSQL, so far) will then need to keep track of per-table data statically, and make appropriate reindexing decisions. We can also readily add VACUUM FULL maintenance code under this same scheme. I actually recommend using shared data (as defined within ILockManager) for this purpose. Cross-process statistics then can be tracked, and indexing requests can be coordinated. Eventually this stuff will be in zookeeper, so performance will be good. In the interim, we can commit changes to counts lazily (every 100 actions or so) to reduce the overhead. Modification tracking data will not be lost or reset if the agents process is restarted in a multi-process system. This is new. In a single-process system, it WILL be lost upon restart, which is as it always has been. FWIW. Database-specific maintenance activities such as reindexing should have their frequency be under the control of the database driver --- Key: CONNECTORS-117 URL: https://issues.apache.org/jira/browse/CONNECTORS-117 Project: ManifoldCF Issue Type: Improvement Components: Framework core Reporter: Karl Wright Not all databases will require maintenance activity at the same frequency, and different versions of the same database may also differ in this way. Two changes should thus be made: (1) Move the database maintenance frequency to be under the control of the database implementation, and (2) Where appropriate, introduce properties.xml properties for each database where this is important. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution
[ https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920542#action_12920542 ] Mark Miller commented on CONNECTORS-116: Indeed - my impression is that we are all happy to see this code be pulled if that's what the original contributors want (or what they are legally bound to want) - we just think that process should be public before the code is silently taken out back and shot ;) Possibly remove memex connector depending upon legal resolution --- Key: CONNECTORS-116 URL: https://issues.apache.org/jira/browse/CONNECTORS-116 Project: ManifoldCF Issue Type: Task Components: Memex connector Reporter: Robert Muir Assignee: Robert Muir Apparently there is an IP problem with the memex connector code. Depending upon what apache legal says, we will take any action under this issue publicly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution
[ https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920568#action_12920568 ] Jack Krupansky commented on CONNECTORS-116: --- It would be nice to see a comment about what would be required to add Memex support back. I note the following statement in the original incubation submission: It is unlikely that EMC, OpenText, Memex, or IBM would grant Apache-license-compatible use of these client libraries. Thus, the expectation is that users of these connectors obtain the necessary client libraries from the owners prior to building or using the corresponding connector. An alternative would be to undertake a clean-room implementation of the client API's, which may well yield suitable results in some cases (LiveLink, Memex, FileNet), while being out of reach in others (Documentum). Conditional compilation, for the short term, is thus likely to be a necessity. Is it only the Memex connector that now has this problem? Do we need do a clean-room implementation for Memex? For any of the others? FWIW, I don't see a Google Connector for Memex. Possibly remove memex connector depending upon legal resolution --- Key: CONNECTORS-116 URL: https://issues.apache.org/jira/browse/CONNECTORS-116 Project: ManifoldCF Issue Type: Task Components: Memex connector Reporter: Robert Muir Assignee: Robert Muir Apparently there is an IP problem with the memex connector code. Depending upon what apache legal says, we will take any action under this issue publicly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-116) Possibly remove memex connector depending upon legal resolution
[ https://issues.apache.org/jira/browse/CONNECTORS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920573#action_12920573 ] Karl Wright commented on CONNECTORS-116: This is not about client libraries (which we do not include). This is about the connector code itself. And, yes, only Memex has this problem, because all other connectors that were developed with the help of contractors used third-party contractors who used the typical arrangement that the code they developed belonged to MetaCarta. Memex was the only connector developed using the professional services of the target repository company. Possibly remove memex connector depending upon legal resolution --- Key: CONNECTORS-116 URL: https://issues.apache.org/jira/browse/CONNECTORS-116 Project: ManifoldCF Issue Type: Task Components: Memex connector Reporter: Robert Muir Assignee: Robert Muir Apparently there is an IP problem with the memex connector code. Depending upon what apache legal says, we will take any action under this issue publicly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920609#action_12920609 ] Karl Wright commented on CONNECTORS-118: The key question here is how you describe the component of an archive. There must be a URL to describe it, or there is no way the search results are going to mean anything. Since URL's are the connector's job to assemble, this is likely to be connector specific. Also, most connectors will never be dealing with archives. Can you provide a list of connectors where you believe this is important, and what the URL's to get at the subpieces of the archive look like? Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920705#action_12920705 ] Karl Wright commented on CONNECTORS-118: So this scheme is specific to Apache VFS. What connectors are used to crawl Apache VFS file systems? just the file system connector, no? Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920730#action_12920730 ] Jack Krupansky commented on CONNECTORS-118: --- Support within the file system connector is obviously the higher priority. Windows shares as well. And FTP/SFTP. Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920781#action_12920781 ] Karl Wright commented on CONNECTORS-118: bq. So, if somebody wants to de-reference one of these pseudo URLS they must: Ah. So what you are saying is that the person must either be running a custom browser, or must do some kind of URL manipulation before the search results would be presented to the user, or - or what, exactly? If the url is in fact meant to be real, then it should refer to a custom proxy of some kind that would perform the necessary breakdown. If there is no such service or proxy, those URLs will simply be broken. This represents a major violation of the contract for url generation within ManifoldCF connectors. If there is no such proxy that you are aware of, then I'd much rather generate a real url, which in its raw form would not send you to anything other than the archive itself, but which has enough information to be interpreted properly, by using the anchor trick I alluded to earlier. If there *is* such a proxy, then that proxy's parameters must be added as part of the repository connection configuration. The only case in which the solution you suggest is valid is if you are working on a file system where, when you go to your browser, you enter bz://... for the url, and it actually does the unpacking for you. That would *not* include CIFS, by the way. Is this a fair statement of your proposal? Or am I missing something? Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920787#action_12920787 ] Jack Krupansky commented on CONNECTORS-118: --- Aperture's approach was just a starting point for discussion for how to form an id for a file in an archive file. As long as the MCF rules are functionally equivalent to the Apache VFS rules, we should be okay. In short, my proposal does not have a requirement for what an id should look like, just a suggestion. Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920801#action_12920801 ] Jack Krupansky commented on CONNECTORS-118: --- One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted. Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920801#action_12920801 ] Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM: - I have personally written unit tests that generated most of those formats which Aperture then extracted. See: http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers org.apache.tools.bzip2 - BZIP2 archives. java.util.zip.GZIPInputStream - GZIP archives. javax.mail - message/rfc822-style messages and mbox files. org.apache.tools.tar - tar archives. was (Author: jkrupan): One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted. Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files
[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920827#action_12920827 ] Karl Wright commented on CONNECTORS-118: Agreed, file system is quite straightforward, although CIFS may be a bit more challenging depending on whether the archive processing code accepts an InputStream as input. If so, there would be no need to make a secondary copy in either case. Crawled archive files should be expanded into their constituent files - Key: CONNECTORS-118 URL: https://issues.apache.org/jira/browse/CONNECTORS-118 Project: ManifoldCF Issue Type: New Feature Components: Framework crawler agent Reporter: Jack Krupansky Archive files such as zip, mbox, tar, etc. should be expanded into their constituent files during crawling of repositories so that any output connector would output the flattened archive. This could be an option, defaulted to ON, since someone may want to implement a copy connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.