Hi all,

As part of its recent winter update, the Protein Data Bank in Europe (PDBe; http://pdbe.org/) has improved its facility that allows for tandem searches of PDB and EMDB. It was designed to allow users to carry out many of their day-to-day searches (without the need to fill out a complex form or learn a special query syntax). Simply type what you are looking for, click the SEARCH button, and we will do our best to dig up relevant information, be it in the PDB, in EMDB or on our website.

QUICK ACCESS TO ENTRIES, SERVICES, SEQUENCES
--------------------------------------------

If you go to the PDBe home page (http://pdbe.org/), you will see a Google-like search box in the friendly green banner near the top of the page (just below our motto, "Bringing Structure to Biology"). You can use this search box in a number of ways:

- type a PDB code (e.g., 1cbs), and you will be taken directly to the summary page for that entry. You can type any valid code, even if it's not in the current release, so you can use this facility to obtain information about the status of entries that have not been released yet (e.g., 2yd0) or entries that are no longer in the archive (e.g., theoretical models).

- type a valid EMDB code (e.g., 1607) and you will be taken straight to the summary page for that entry.

HINT: if, instead of being taken directly to a summary page for a certain PDB or EMDB code, you want to actually search PDB and/or EMDB for references to that particular code, simply enclose it in double quotes. For instance, searching for 1mi6 will take you to the summary page for PDB entry 1mi6, whereas searching for "1mi6" will give you a set of hits in both PDB and EMDB that all contain a reference to 1mi6.

- type something resembling a PDBe service or resource name and chances are that the name will be recognised and you will be taken straight to that service or resource (e.g., autodep, emdep, pdbemotif, pdbepisa, pdbefold, pdbechem, quips, portfolio, etc.).

- you can search the protein sequences in the PDB by entering seq: (or sequence:) followed by a (partial) amino-acid sequence in one-letter code (e.g., seq:GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTA). The sequence will be compared to all protein sequences in the PDB using FastA, and the results will be presented to you for further analysis in the PDBe sequence browser (see http://pdbe.org/sequence).

TEXT-BASED SEARCHES
-------------------

Of course you can do general text-based searches of the PDB and EMDB as well - just type one or more search terms in the box and hit the SEARCH button.

- If you type a single search term and it gives hits in the PDB, you will get a results page with a tree structure on the left which shows in which categories the term was found. For instance, if you look for Jones, that could be an author, but it could also be part of the name of a molecule (e.g., Bence Jones protein). By clicking on an appropriate branch in the tree, you select only those entries for which the search term occurs in that data category (e.g., author or PDB compound).

- If you type more than one search term, only entries that contain all these terms will be selected as hits. For instance, if you search for "kleywegt po4" - without the quotes - you will get only one hit, 1CBQ. Note that if you enclose your search terms in double quotes, you will only get hits that match exactly (i.e., the complete search expression must occur somewhere in the entry, not just all of the keywords individually). For instance, searching for "HCV NS3 protease" yields 31 hits in the PDB if you enclose the terms in double quotes, but 177 hits if you don't.

Note that there are two tabs on the results page - one labelled "PDB entries" and the other "EMDB entries". If you do a search for Baumeister, you will get 14 hits in the PDB. If you click on the "EMDB entries" tab, you will find that there are 10 hits in EMDB.

HINT: if you want the EMDB results tab to become active straightaway, preface your search term(s) by "emdb:" (without the quotes), e.g. search for emdb:saibil and you will immediately get the list of 56 EMDB hits.

SEARCH RESULTS
--------------

The search results are sorted by release date by default, with the most recently released entries at the top. This ensures that if you read an exciting paper about new ClpC structures, a search for clpc will give you the latest entries first. You can change the sort order and criterion with a drop-down menu.

Each entry that is found as a hit in a search is shown in a panel that contains useful summary information and allows you to launch various searches and services with a single mouse-click. If you do a search for hiv-1, for example, you will get many hits in the PDB and two dozen in EMDB:

- For each PDB hit you will see: the PDB code, a small image of the structure, the resolution (for X-ray and EM structures), the title of the entry, a set of PDBprints that provide at-a-glance information about the entry (see http://pdbe.org/pdbprints). Two action buttons are also shown: "Entry summary" and "Download PDB file" - when pressed, they will do what they promise. If you click on "More ..." (or on "Expand all ..." at the top of the results tab) you will see even more information, namely the release date, information about the publication describing the structure, possible cross-references to EMDB entries and four more action buttons ("Quick links to related PDBe services"), namely: * "Download other files" (takes you to a download page with mmCIF files, experimental data files, etc.), * "Quaternary structure" (which takes you straight to the PISA results about probable assemblies), * "Similar structures" (which will automatically launch an SSM/PDBeFold search of the PDB to look for structures with similar folds), * "Motifs and sites" (which will take you to the PDBeMotif analysis of the structure - this may not always work for very recent entries, but we are working on solving this issue).

- For each EMDB hit you will see similar information as for the PDB hits. Instead of a PDB file, there will be an action button to "Download header file". If the EM map/tomogram has been released, there will also be a button to download it. If you click "More ..." you will sometimes see "Other EMDB entries from this publication" (if one paper describes more than one EMDB entry).

NOTE: if you search for hiv-1 today, you will note that the top 2 EMDB hits have release dates of 29 March 2011, but the maps are not yet available for download. This has to do with the way entries are released in practice (EMDB and PDB use a weekly release cycle). Once the release date has arrived, an EMDB entry will be flagged (for release) on the first Thursday following the release date, which means the map will become available in the next weekly release (which will be on the first Wednesday after that Thursday).

LIMITATIONS AND SEARCH TIPS
---------------------------

As you have seen, the Google-like box in the PDBe banner allows you to carry out many standard searches quickly and accurately, but with some limitations, such as:

- you cannot use regular expressions or operators such as NOT, AND and OR
- you cannot use wildcards (e.g., searching for "vanil*" will not give any hits) - if you search by author name, you will get better results if you only provide the surname (searching for "kleywegt gj" only returns one hit, and it's not a structure determined by that person) - at present, there is no way of ranking the results by relevance (the search includes PDB keywords and the PubMed abstract, both of which can lead to false positives)

In general, the more search terms you enter, the fewer results you will get (as they are all required to occur). Useful search terms are (combinations of) the following:

- surnames of authors (e.g., rossmann, sixma, allerston, akke, walse)
- names of proteins (e.g., HMG CoA reductase, Lon protease, bacteriorhodopsin)
- (parts of) species names (e.g., "plasmodium falciparum")
- common names of chemical compounds (e.g., retinol, sildenafil, nadph)
- database identifiers from EC, PubMed, UniProt, etc. (e.g., entering 20890284 will retrieve two structures that have that number as their PubMed identifier; searching for CH60_ECOLI will retrieve 30 hits, 28 of which have that as a UniProt identifier; searching for 1.1.1.27 will return 105 hits, 63 of which contain a lactate dehydrogenase with that EC number) - (part of) a valid GO term (e.g., "intracellular protein transport", "signal sequence", "anchored to membrane")

If you want to carry out more sophisticated searches, in which you can specify that you want to search for a term in a particular category of information (e.g., looking for "parkinson" in an abstract rather than as an author, or looking for "cancer" as part of a reference rather than a keyword), you can use the PDBe advanced search facilities. An action button labelled "Advanced search" is available between the green PDBe banner and the search results.

                                   -----

We welcome your comments, bug reports and feature requests on the "quick-and-dirty" PDBe search facility. Please use the feedback button at the top of any PDBe web page.

--Gerard

---
Gerard J. Kleywegt, PDBe, EMBL-EBI, Hinxton, UK
ger...@ebi.ac.uk ..................... pdbe.org
Secretary: Pauline Haslam  pdbe_ad...@ebi.ac.uk

Reply via email to