Re: [Bibdesk-users] Importing PDF files
On Oct 14, 2011, at 14:57 , Fischlin Andreas wrote: > I guess you tested this with records where the doi is actually known? As a matter of fact, I did. It wouldn't have been much of a test, otherwise, would it? It took a while to find a way to search on DOI in the web interface, but I'm more familiar with Compendex than WoS. > As I warned, ISI WOS offers doi's only for the more recent records. > > Then I guess, that's the trouble with ISI WOS (Thomson Research). I told you > that they tried to promote their proprietary ISI WOS designators and to > ignore doi's as long as possible. They had finally to give in, but not > implementing it in soap is obviously a "good way" for them to continue > sabotaging doi. I can't say whether it's malicious, but the Thomson documentation that I have for their service is from 2005, when DOI wasn't nearly as ubiquitous as it is today. The help page we link to in the BD manual is http://images.webofknowledge.com/WOK46/help/WOS/h_advanced_examples.html which doesn't mention the DO key for DOI, so they may have only started indexing it recently. -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
I guess you tested this with records where the doi is actually known? As I warned, ISI WOS offers doi's only for the more recent records. Then I guess, that's the trouble with ISI WOS (Thomson Research). I told you that they tried to promote their proprietary ISI WOS designators and to ignore doi's as long as possible. They had finally to give in, but not implementing it in soap is obviously a "good way" for them to continue sabotaging doi. Regards, Andreas On 14/10/2011, at 18:30 , Maxwell, Adam R wrote: > > On Oct 12, 2011, at 08:47, Fischlin Andreas wrote: > >> Searching a single publication through the doi might be fast, however, will >> work only for more recent publications > > I tried this briefly. It seems that searching WoS based on DOI is only > allowed through the web interface, not SOAP, so we're out of luck. Too > bad…this was the first feature I'd looked forward to in a long time :(. > > -- > Adam > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 08:47, Fischlin Andreas wrote: > Searching a single publication through the doi might be fast, however, will > work only for more recent publications I tried this briefly. It seems that searching WoS based on DOI is only allowed through the web interface, not SOAP, so we're out of luck. Too bad…this was the first feature I'd looked forward to in a long time :(. -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
Indeed, I would support that suggestion. I have it on since years and never regretted (so far) ;-) Regards, Andreas On 13/10/2011, at 16:32 , Gregory Jefferis wrote: > > On 12 Oct 2011, at 14:23, Adam R. Maxwell wrote: > >> >> On Oct 12, 2011, at 05:09 , M. Tamer Özsu wrote: >> >>> I was simply wondering out loud how some of these other programs manage to >>> extract the title/author/... data from the PDF files to at least attempt to >>> generate some of this citation information. I now understand that Bibdesk >>> does not do this, and that is perfectly fine. >> >> They do it by scraping information from the PDF, including the DOI. BibDesk >> can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm >> hidden preference. I don't use it myself, since it only searches PubMed. >> Pretty similar code could probably be used to run a Web of Science search, >> though, come to think of it... > > Is there any reason why this is turned off by default? I actually wrote most > of that code and I thought that it had broken because I did not notice the > hidden pref. For biologists/medics, this is an incredibly timesaver and it > works with nearly all modern PDFs in this domain. > > People don't normally drop PDFs onto BibDesk unless they have an existing > reference. So I don't see that setting this to true by default would much > inconvenience anyone. Setting it to false means that very few people will > ever notice and use it. > > Best, > > Greg. > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On 12 Oct 2011, at 14:23, Adam R. Maxwell wrote: > > On Oct 12, 2011, at 05:09 , M. Tamer Özsu wrote: > >> I was simply wondering out loud how some of these other programs manage to >> extract the title/author/... data from the PDF files to at least attempt to >> generate some of this citation information. I now understand that Bibdesk >> does not do this, and that is perfectly fine. > > They do it by scraping information from the PDF, including the DOI. BibDesk > can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm > hidden preference. I don't use it myself, since it only searches PubMed. > Pretty similar code could probably be used to run a Web of Science search, > though, come to think of it... Is there any reason why this is turned off by default? I actually wrote most of that code and I thought that it had broken because I did not notice the hidden pref. For biologists/medics, this is an incredibly timesaver and it works with nearly all modern PDFs in this domain. People don't normally drop PDFs onto BibDesk unless they have an existing reference. So I don't see that setting this to true by default would much inconvenience anyone. Setting it to false means that very few people will ever notice and use it. Best, Greg. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 13, 2011, at 06:42 , Fischlin Andreas wrote: > I was talking only on the web scraping. That AFAIK costs a lot, e.g. > http://www.automationanywhere.com/solutions/screenScrape.htm. But perhaps > that's not an issue at all. I think we're talking at cross purposes. BibDesk's screen scraping is free; that's the technique it uses in the web group(s). The sites it can scrape, OTOH, may require a subscription for access. -- adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
I was talking only on the web scraping. That AFAIK costs a lot, e.g. http://www.automationanywhere.com/solutions/screenScrape.htm. But perhaps that's not an issue at all. Regards, Andreas On 13/10/2011, at 15:00 , Adam R. Maxwell wrote: > > On Oct 13, 2011, at 00:57 , Fischlin Andreas wrote: > >> AFAIK those services cost a lot. I do not see how that is compatible with an >> open source BibDesk. > > I don't see what on earth you're talking about. We've supported Web of > Science searching in BibDesk for several years, and many (most?) of the web > groups are for-pay sites such as IEEE. > > -- > Adam > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 13, 2011, at 00:57 , Fischlin Andreas wrote: > AFAIK those services cost a lot. I do not see how that is compatible with an > open source BibDesk. I don't see what on earth you're talking about. We've supported Web of Science searching in BibDesk for several years, and many (most?) of the web groups are for-pay sites such as IEEE. -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
AFAIK those services cost a lot. I do not see how that is compatible with an open source BibDesk. Regards, Andreas On 13/10/2011, at 02:35 , Maxwell, Adam R wrote: > > On Oct 12, 2011, at 17:29, Douglas Stebila wrote: > >> On 2011-10-13, at 0:52, "Adam R. Maxwell" wrote: >> >>> AFAIK, none of the screen-scraping sites in the web group are suitable for >>> a query, unfortunately. You need a service such as PubMed or Web of >>> Science with an actual API. >> >> Couldn't you visit the page defined by the DOI, and, if it's a page from a >> site that the web group scrapers know how to scrape, then it scrapes that >> data? Obviously suffers from the limitations of scraping being inaccurate / >> fragile, but it should work a bit. > > Oh, that's a neat idea, and it probably would work. I was thinking you'd > somehow craft a query string for each site, which is harder. > > -- > Adam > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 17:29, Douglas Stebila wrote: > On 2011-10-13, at 0:52, "Adam R. Maxwell" wrote: > >> AFAIK, none of the screen-scraping sites in the web group are suitable for a >> query, unfortunately. You need a service such as PubMed or Web of Science >> with an actual API. > > Couldn't you visit the page defined by the DOI, and, if it's a page from a > site that the web group scrapers know how to scrape, then it scrapes that > data? Obviously suffers from the limitations of scraping being inaccurate / > fragile, but it should work a bit. Oh, that's a neat idea, and it probably would work. I was thinking you'd somehow craft a query string for each site, which is harder. -- Adam smime.p7s Description: S/MIME cryptographic signature -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On 2011-10-13, at 0:52, "Adam R. Maxwell" wrote: > AFAIK, none of the screen-scraping sites in the web group are suitable for a > query, unfortunately. You need a service such as PubMed or Web of Science > with an actual API. Couldn't you visit the page defined by the DOI, and, if it's a page from a site that the web group scrapers know how to scrape, then it scrapes that data? Obviously suffers from the limitations of scraping being inaccurate / fragile, but it should work a bit. Douglas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 08:18, Christiaan Hofman wrote: > On Oct 12, 2011, at 16:50, Adam R. Maxwell wrote: > >> On Oct 12, 2011, at 07:17 , Christiaan Hofman wrote: >> >>> Does WoS run well synchronously? >> >> All of the SOAP calls are synchronous; they use the "WSGeneratedObj-sync" >> runloop mode to block until results are available. As to whether it runs >> well...I guess that depends :). >> > > Not what I meant. I mean synchronous as in synchronous with the app, i.e. on > the main thread. The ISI search group runs the SOAP calls from a secondary > thread. I'm not sure what you mean, then. The thread it runs on doesn't matter, though it runs on a secondary thread precisely because the search will block the calling thread (main or otherwise). This should be consistent with the PubMed usage in BibItem_PubMedLookup. If I were designing the search groups today, I'd take a significantly different approach to the async problem, but that's not terribly relevant. I did this in my testing version of BD for BDSKPreviewer, eliminating all the crazy DO and locking. -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
I use ISI WOS daily through BibDesk's soap interface. It worked fairly well all the time in the last months. Searching a single publication through the doi might be fast, however, will work only for more recent publications, since the company tried stuff down our throat for years their own proprietary reference ID system. The latter works well since a very long time. Accessing ISI WOS via doi or their own ISI ID might actually be a very nice feature to have in BD. Searching a record via ISI BD is something I do only via browser, i.e. I use an URL similar to this http://links.isiglobalnet2.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=ResearchSoft&SrcApp=EndNote&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000268938300035 which is often also written as ://000268938300035 As you can see the only variable element denoting a specific record in ISI WOS is KeyUT, in above examples 000268938300035 I do not know what the corresponding soap syntax would be. Regards, Andreas On 12/10/2011, at 16:17 , Christiaan Hofman wrote: > > On Oct 12, 2011, at 15:23, Adam R. Maxwell wrote: > >> >> On Oct 12, 2011, at 05:09 , M. Tamer Özsu wrote: >> >>> I was simply wondering out loud how some of these other programs manage to >>> extract the title/author/... data from the PDF files to at least attempt to >>> generate some of this citation information. I now understand that Bibdesk >>> does not do this, and that is perfectly fine. >> >> They do it by scraping information from the PDF, including the DOI. BibDesk >> can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm >> hidden preference. I don't use it myself, since it only searches PubMed. >> Pretty similar code could probably be used to run a Web of Science search, >> though, come to think of it... >> >> -- >> Adam > > Does WoS run well synchronously? I thought that was the main reason only > PubMed is searched. > > Christiaan > > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 16:50, Adam R. Maxwell wrote: > > On Oct 12, 2011, at 07:17 , Christiaan Hofman wrote: > >> Does WoS run well synchronously? > > All of the SOAP calls are synchronous; they use the "WSGeneratedObj-sync" > runloop mode to block until results are available. As to whether it runs > well...I guess that depends :). > Not what I meant. I mean synchronous as in synchronous with the app, i.e. on the main thread. The ISI search group runs the SOAP calls from a secondary thread. >> I thought that was the main reason only PubMed is searched. > > I think it's more because that code started out as PubMed ID-based lookup, > and the DOI code was added as an enhancement some years later. Yes, but at that time I remember there was some idea of making it more generic, but the idea was dropped mainly because of this. Christiaan -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 07:10 , Douglas Stebila wrote: > On 2011-Oct-12, at 11:23 PM, Adam R. Maxwell wrote: > >> They do it by scraping information from the PDF, including the DOI. BibDesk >> can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm >> hidden preference. I don't use it myself, since it only searches PubMed. >> Pretty similar code could probably be used to run a Web of Science search, >> though, come to think of it… > > … or in fact any of the sites that are currently supported for import, such > as Springer, etc. AFAIK, none of the screen-scraping sites in the web group are suitable for a query, unfortunately. You need a service such as PubMed or Web of Science with an actual API. -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 07:17 , Christiaan Hofman wrote: > Does WoS run well synchronously? All of the SOAP calls are synchronous; they use the "WSGeneratedObj-sync" runloop mode to block until results are available. As to whether it runs well...I guess that depends :). > I thought that was the main reason only PubMed is searched. I think it's more because that code started out as PubMed ID-based lookup, and the DOI code was added as an enhancement some years later. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On 2011-Oct-12, at 11:23 PM, Adam R. Maxwell wrote: > They do it by scraping information from the PDF, including the DOI. BibDesk > can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm > hidden preference. I don't use it myself, since it only searches PubMed. > Pretty similar code could probably be used to run a Web of Science search, > though, come to think of it… … or in fact any of the sites that are currently supported for import, such as Springer, etc. Douglas -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 15:23, Adam R. Maxwell wrote: > > On Oct 12, 2011, at 05:09 , M. Tamer Özsu wrote: > >> I was simply wondering out loud how some of these other programs manage to >> extract the title/author/... data from the PDF files to at least attempt to >> generate some of this citation information. I now understand that Bibdesk >> does not do this, and that is perfectly fine. > > They do it by scraping information from the PDF, including the DOI. BibDesk > can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm > hidden preference. I don't use it myself, since it only searches PubMed. > Pretty similar code could probably be used to run a Web of Science search, > though, come to think of it... > > -- > Adam Does WoS run well synchronously? I thought that was the main reason only PubMed is searched. Christiaan -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 12, 2011, at 05:09 , M. Tamer Özsu wrote: > I was simply wondering out loud how some of these other programs manage to > extract the title/author/... data from the PDF files to at least attempt to > generate some of this citation information. I now understand that Bibdesk > does not do this, and that is perfectly fine. They do it by scraping information from the PDF, including the DOI. BibDesk can also do this, using the BDSKShouldParsePDFToGeneratePubMedSearchTerm hidden preference. I don't use it myself, since it only searches PubMed. Pretty similar code could probably be used to run a Web of Science search, though, come to think of it... -- Adam -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
HI Andreas, Thanks for this. As I noted in a separate response, the path was easy to fix, and I knew that -- what I didn't know was to set the default fields. Now the file shows up. The issue of importing PDF files was a separate question that I wondered about. I was not at all criticizing Bibdesk, which I use daily and regularly (and it holds the citation information for all of my writings in Latex), nor was I comparing it with Papers. I was simply wondering out loud how some of these other programs manage to extract the title/author/... data from the PDF files to at least attempt to generate some of this citation information. I now understand that Bibdesk does not do this, and that is perfectly fine. Best. ==Tamer On 2011-10-12, at 11:18 AM, Fischlin Andreas wrote: > Hi Tamer Özsu, > > Look at the data you provided: > > file = {:Users/tozsu/Documents/Collected Papers/Proceedings Papers/EDBT/EDBT > 2011/LID11(EDBT)\_a6-zaniolo.pdf}, > > This is AFAIK not a valid file path, e.g. it starts with a ":". I guess > Christiaan's word use 'junk' is more too often true than not. Making wild > guesses, as perhaps other programs such as Papers do, may also not be always > helpful, because it can be very confusing if it once works and once not for > reasons very difficult to comprehend. And if you look at other program > characteristics, e.g. Papers' lousy cite key generation, you find that > perhaps a small advantage as guessing when pdf metadata are bad is paid with > deficiencies that are intolerable (at least for me). > > This is just meant as some considerations to also take into account before > criticizing BibDesk too much. > > Sincerely yours, > Andreas Fischlin > > > ETH Zurich > Prof. Dr. Andreas Fischlin > Systems Ecology - Institute of Integrative Biology > CHN E 21.1 > Universitaetstrasse 16 > 8092 Zurich > SWITZERLAND > > andreas.fisch...@env.ethz.ch > www.sysecol.ethz.ch > > +41 44 633-6090 phone > +41 44 633-1136 fax > +41 79 221-4657 mobile > > Make it as simple as possible, but distrust it! > > > > > On 12/10/2011, at 10:10 , M. Tamer Özsu wrote: > >> Thank you. I'll study that. >> >> I wonder how programs such as Papers and Mendeley are able to extract that >> info from PDF files -- or do they not extract them from the PDFs? >> >> ==Tamer >> >> On 2011-10-11, at 11:20 PM, Maxwell, Adam R wrote: >> >>> >>> On Oct 11, 2011, at 14:16, M. Tamer Özsu wrote: >>> Some PDF files do have this information included as metadata that some programs are able to extract. I thought there might be a mechanism such as that, but I understand that there isn't. >>> >>> There is, but most of that metadata is junk. Look for >>> BDSKShouldUsePDFMetadata on this page: >>> >>> http://sourceforge.net/apps/mediawiki/bibdesk/index.php?title=Tips_and_Tricks >>> >>> >>> -- >>> All the data continuously generated in your IT infrastructure contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2d-oct >>> ___ >>> Bibdesk-users mailing list >>> Bibdesk-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/bibdesk-users >> >> >> -- >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> ___ >> Bibdesk-users mailing list >> Bibdesk-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bibdesk-users > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it.
Re: [Bibdesk-users] Importing PDF files
No, they often do not extract them from the pdf. They often search in the pdf for a doi and then provide the actual data from a provider such as ISI WOS or other data base sources. BibDesk does not do that. As a consequence you see in BibDesk how lousy the meta data in the pdf generally actually are (at least in my experience). Regards, Andreas ETH Zurich Prof. Dr. Andreas Fischlin Systems Ecology - Institute of Integrative Biology CHN E 21.1 Universitaetstrasse 16 8092 Zurich SWITZERLAND andreas.fisch...@env.ethz.ch www.sysecol.ethz.ch +41 44 633-6090 phone +41 44 633-1136 fax +41 79 221-4657 mobile Make it as simple as possible, but distrust it! On 12/10/2011, at 10:10 , M. Tamer Özsu wrote: > Thank you. I'll study that. > > I wonder how programs such as Papers and Mendeley are able to extract that > info from PDF files -- or do they not extract them from the PDFs? > > ==Tamer > > On 2011-10-11, at 11:20 PM, Maxwell, Adam R wrote: > >> >> On Oct 11, 2011, at 14:16, M. Tamer Özsu wrote: >> >>> Some PDF files do have this information included as metadata that some >>> programs are able to extract. I thought there might be a mechanism such as >>> that, but I understand that there isn't. >> >> There is, but most of that metadata is junk. Look for >> BDSKShouldUsePDFMetadata on this page: >> >> http://sourceforge.net/apps/mediawiki/bibdesk/index.php?title=Tips_and_Tricks >> >> >> -- >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> ___ >> Bibdesk-users mailing list >> Bibdesk-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bibdesk-users > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
Hi Tamer Özsu, Look at the data you provided: file = {:Users/tozsu/Documents/Collected Papers/Proceedings Papers/EDBT/EDBT 2011/LID11(EDBT)\_a6-zaniolo.pdf}, This is AFAIK not a valid file path, e.g. it starts with a ":". I guess Christiaan's word use 'junk' is more too often true than not. Making wild guesses, as perhaps other programs such as Papers do, may also not be always helpful, because it can be very confusing if it once works and once not for reasons very difficult to comprehend. And if you look at other program characteristics, e.g. Papers' lousy cite key generation, you find that perhaps a small advantage as guessing when pdf metadata are bad is paid with deficiencies that are intolerable (at least for me). This is just meant as some considerations to also take into account before criticizing BibDesk too much. Sincerely yours, Andreas Fischlin ETH Zurich Prof. Dr. Andreas Fischlin Systems Ecology - Institute of Integrative Biology CHN E 21.1 Universitaetstrasse 16 8092 Zurich SWITZERLAND andreas.fisch...@env.ethz.ch www.sysecol.ethz.ch +41 44 633-6090 phone +41 44 633-1136 fax +41 79 221-4657 mobile Make it as simple as possible, but distrust it! On 12/10/2011, at 10:10 , M. Tamer Özsu wrote: > Thank you. I'll study that. > > I wonder how programs such as Papers and Mendeley are able to extract that > info from PDF files -- or do they not extract them from the PDFs? > > ==Tamer > > On 2011-10-11, at 11:20 PM, Maxwell, Adam R wrote: > >> >> On Oct 11, 2011, at 14:16, M. Tamer Özsu wrote: >> >>> Some PDF files do have this information included as metadata that some >>> programs are able to extract. I thought there might be a mechanism such as >>> that, but I understand that there isn't. >> >> There is, but most of that metadata is junk. Look for >> BDSKShouldUsePDFMetadata on this page: >> >> http://sourceforge.net/apps/mediawiki/bibdesk/index.php?title=Tips_and_Tricks >> >> >> -- >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> ___ >> Bibdesk-users mailing list >> Bibdesk-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bibdesk-users > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
Thank you. I'll study that. I wonder how programs such as Papers and Mendeley are able to extract that info from PDF files -- or do they not extract them from the PDFs? ==Tamer On 2011-10-11, at 11:20 PM, Maxwell, Adam R wrote: > > On Oct 11, 2011, at 14:16, M. Tamer Özsu wrote: > >> Some PDF files do have this information included as metadata that some >> programs are able to extract. I thought there might be a mechanism such as >> that, but I understand that there isn't. > > There is, but most of that metadata is junk. Look for > BDSKShouldUsePDFMetadata on this page: > > http://sourceforge.net/apps/mediawiki/bibdesk/index.php?title=Tips_and_Tricks > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 11, 2011, at 14:16, M. Tamer Özsu wrote: > Some PDF files do have this information included as metadata that some > programs are able to extract. I thought there might be a mechanism such as > that, but I understand that there isn't. There is, but most of that metadata is junk. Look for BDSKShouldUsePDFMetadata on this page: http://sourceforge.net/apps/mediawiki/bibdesk/index.php?title=Tips_and_Tricks -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
Some PDF files do have this information included as metadata that some programs are able to extract. I thought there might be a mechanism such as that, but I understand that there isn't. -- M. Tamer Özsu University of Waterloo (Currently on sabbatical leave at ETH Zürich) On 2011-10-11, at 11:04 PM, Christiaan Hofman wrote: > > On Oct 11, 2011, at 22:48, M. Tamer Özsu wrote: > >> I think I am doing something wrong and would appreciate any help. I thought >> it was possible to drag and drop a PDF file on the bibdesk library, but >> although the PDF is included as an entry, it has empty title (all the fields >> are empty in fact). I am dropping it to the top window when bibdesk is open. >> Am I doing something wrong somewhere? >> >> Thanks. >> >> -- >> M. Tamer Özsu >> University of Waterloo >> >> (Currently on sabbatical leave at ETH Zürich) > > No, you're doing nothing wrong. The PDF is a file, it's not a bibliography > entry, so where would it get the fields from? (and if your answer is "from > the PDF", then the answer back is How?) > > Christiaan > > > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] Importing PDF files
On Oct 11, 2011, at 22:48, M. Tamer Özsu wrote: > I think I am doing something wrong and would appreciate any help. I thought > it was possible to drag and drop a PDF file on the bibdesk library, but > although the PDF is included as an entry, it has empty title (all the fields > are empty in fact). I am dropping it to the top window when bibdesk is open. > Am I doing something wrong somewhere? > > Thanks. > > -- > M. Tamer Özsu > University of Waterloo > > (Currently on sabbatical leave at ETH Zürich) No, you're doing nothing wrong. The PDF is a file, it's not a bibliography entry, so where would it get the fields from? (and if your answer is "from the PDF", then the answer back is How?) Christiaan -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
[Bibdesk-users] Importing PDF files
I think I am doing something wrong and would appreciate any help. I thought it was possible to drag and drop a PDF file on the bibdesk library, but although the PDF is included as an entry, it has empty title (all the fields are empty in fact). I am dropping it to the top window when bibdesk is open. Am I doing something wrong somewhere? Thanks. -- M. Tamer Özsu University of Waterloo (Currently on sabbatical leave at ETH Zürich) -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] importing pdf files
Dear Adam, Thank you for your reply and sorry for late reply. The way you suggested worked well! I shall add note here what I did: I changed all the strings in each entry from (JabRef form) File = {FILENAME.pdf:FILEDIRECTORYNAME/FILENAME.pdf:PDF}, to (BibDesk form) Local-Url = {FILEDIRECTORYNAME/FILENAME.pdf}, . Then open by BibDesk. One caution appear and verified. Then all the files are now linked. I also generate new Cite Key for all entry. (ex: %a1_%f{Journal}20_%Y => Author_Journal_Year) Also Auto file to all the .pdfs. (ex: %f{Cite Key}50_v%n%e => Author_Journal_Year_v#.pdf) Finally I got what I wanted. Thank you so much! BR, masa > > On Jul 23, 2010, at 9:47 PM, Masahiro Takahashi wrote: > >> I'm a switcher from JabRef to BibDesk. >> If there are documentations for switching, let me know. >> (Of course, I searched it already. But I might be bad at it...) >> >> What I really want to do is liking the pdf files automatically, >> since I have .bib file which has a field called >> "file" >> which defines the place of pdf file, for example, >> file = {InouyeNature1998.pdf:Article/InouyeNature1998.pdf:PDF}. >> I've been created "Article" directory and put all the files in it. > > If you can create a Local-Url field with the full path to your > file, BibDesk will recognize that and convert it to the file > alias that it uses internally. Offhand, I'd probably try doing > this in a couple of passes with a text editor that supports > regular expressions with backreferences, but I suspect you could > also do it in BibDesk with a combination of cmd-shift-f > Advanced Find & Replace and AppleScript to rename the field. > > Just make sure to operate on a backup of your .bib file. Also, > BibDesk won't actually move any of your files unless you manually > tell it to autofile them, so this is a fairly low-risk operation. > > -- > Adam > > > -- > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] importing pdf files
Dear Masa, Find some AppleScripts that might do the job after only minor adjustments from my new BibDesk AppleScript website: http://se-server.ethz.ch/staff/af/bibdesk/index.html Regards, Andreas ETH Zurich Prof. Dr. Andreas Fischlin Systems Ecology - Institute of Integrative Biology CHN E 21.1 Universitaetstrasse 16 8092 Zurich SWITZERLAND andreas.fisch...@env.ethz.ch www.sysecol.ethz.ch +41 44 633-6090 phone +41 44 633-1136 fax +41 79 221-4657 mobile Make it as simple as possible, but distrust it! On 23/Jul/2010, at 10:11 , Masahiro Takahashi wrote: > Hi all, > > I'm a switcher from JabRef to BibDesk. > If there are documentations for switching, let me know. > (Of course, I searched it already. But I might be bad at it...) > > What I really want to do is liking the pdf files automatically, > since I have .bib file which has a field called > "file" > which defines the place of pdf file, for example, > file = {InouyeNature1998.pdf:Article/InouyeNature1998.pdf:PDF}. > I've been created "Article" directory and put all the files in it. > > And also, I put "Cite key" (which is called "Bibtexkey" in JabRef) for > the name of the pdf files such as > LastnameJournalYear.pdf > if "Cite key" is "LastnameJournalYear". > > Is there any efficient way of switching from JabRef? > > Thank you for your advice in advance! > > Sincerely, > masa > > -- > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > ___ > Bibdesk-users mailing list > Bibdesk-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bibdesk-users -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Re: [Bibdesk-users] importing pdf files
On Jul 23, 2010, at 9:47 PM, Masahiro Takahashi wrote: > I'm a switcher from JabRef to BibDesk. > If there are documentations for switching, let me know. > (Of course, I searched it already. But I might be bad at it...) > > What I really want to do is liking the pdf files automatically, > since I have .bib file which has a field called > "file" > which defines the place of pdf file, for example, > file = {InouyeNature1998.pdf:Article/InouyeNature1998.pdf:PDF}. > I've been created "Article" directory and put all the files in it. If you can create a Local-Url field with the full path to your file, BibDesk will recognize that and convert it to the file alias that it uses internally. Offhand, I'd probably try doing this in a couple of passes with a text editor that supports regular expressions with backreferences, but I suspect you could also do it in BibDesk with a combination of cmd-shift-f Advanced Find & Replace and AppleScript to rename the field. Just make sure to operate on a backup of your .bib file. Also, BibDesk won't actually move any of your files unless you manually tell it to autofile them, so this is a fairly low-risk operation. -- Adam -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
[Bibdesk-users] importing pdf files
Hi all, I'm a switcher from JabRef to BibDesk. If there are documentations for switching, let me know. (Of course, I searched it already. But I might be bad at it...) What I really want to do is liking the pdf files automatically, since I have .bib file which has a field called "file" which defines the place of pdf file, for example, file = {InouyeNature1998.pdf:Article/InouyeNature1998.pdf:PDF}. I've been created "Article" directory and put all the files in it. And also, I put "Cite key" (which is called "Bibtexkey" in JabRef) for the name of the pdf files such as LastnameJournalYear.pdf if "Cite key" is "LastnameJournalYear". Is there any efficient way of switching from JabRef? Thank you for your advice in advance! Sincerely, masa -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users
[Bibdesk-users] importing pdf files
Hi all, I'm a switcher from JabRef to BibDesk. If there are documentations for switching, let me know. (Of course, I searched it already. But I might be bad at it...) What I really want to do is liking the pdf files automatically, since I have .bib file which has a field called "file" which defines the place of pdf file, for example, file = {InouyeNature1998.pdf:Article/InouyeNature1998.pdf:PDF}. I've been created "Article" directory and put all the files in it. And also, I put "Cite key" (which is called "Bibtexkey" in JabRef) for the name of the pdf files such as LastnameJournalYear.pdf if "Cite key" is "LastnameJournalYear". Is there any efficient way of switching from JabRef? Thank you for your advice in advance! Sincerely, masa -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first ___ Bibdesk-users mailing list Bibdesk-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bibdesk-users