Otávio Vasques created ARROW-7727:

             Summary: Unable to read a ParquetDataset when schema validation is 
                 Key: ARROW-7727
                 URL: https://issues.apache.org/jira/browse/ARROW-7727
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.1
            Reporter: Otávio Vasques
             Fix For: 0.16.0

I was trying to read a subset of my parquet files using the ParquetDataset 
object with a predefined schema, when it tries to validate the schema a 
`to_arrow_schema` is called and the schema does not support this. I don't what 
is happening, this is a sample:


``` python
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import numpy as np

schema = pa.schema([
    pa.field("field1", pa.string()),
    pa.field("field2", pa.string()),
    pa.field("field3", pa.string()),


pq_dataset = pq.ParquetDataset(file_groups[0], schema=schema)

AttributeError: 'pyarrow.lib.Schema' object has no attribute 'to_arrow_schema'

If we check the type of the schema as defined above we get:
But the required type according with the docs is `pyarrow.parquet.Schema`, I 
don't know how to produce a object with this since we are forbbiden to use the 
Schema constructor directly.

If we check the implementation on github we get directly this line 
dataset_schema = self.schema.to_arrow_schema()

Is this a problem in the schema builder or the parquet dataset object?

