Re: [I] Importing example dataset failes because of database constraint [superset]

via GitHub Mon, 12 Aug 2024 01:50:51 -0700


dosubot[bot] commented on issue #29915:
URL: https://github.com/apache/superset/issues/29915#issuecomment-2283422296


   <!-- Greeting -->
   @jelu Hello, I'm here to help you with any bugs or questions you have. I can 
also assist you in becoming a contributor.
   
   <!-- Answer -->
   To resolve the issue of a unique constraint on table names when importing an 
example dataset in Superset, you can modify the uniqueness constraint to 
include the `cluster_name` along with the `datasource_name`. This makes the 
constraint relative to the database/source instead of globally on just the 
table name.
   
   Here is the relevant part of the migration script that addresses this:
   
   ```python
   def upgrade():
       bind = op.get_bind()
       insp = sa.engine.reflection.Inspector.from_engine(bind)
   
       # Add the new less restrictive uniqueness constraint.
       with op.batch_alter_table("datasources", naming_convention=conv) as 
batch_op:
           batch_op.create_unique_constraint(
               "uq_datasources_cluster_name", ["cluster_name", 
"datasource_name"]
           )
   
       # Augment the tables which have a foreign key constraint related to the
       # datasources.datasource_name column.
       for foreign in ["columns", "metrics"]:
           with op.batch_alter_table(foreign, naming_convention=conv) as 
batch_op:
               # Add the datasource_id column with the relevant constraints.
               batch_op.add_column(sa.Column("datasource_id", sa.Integer))
   
               batch_op.create_foreign_key(
                   f"fk_{foreign}_datasource_id_datasources",
                   "datasources",
                   ["datasource_id"],
                   ["id"],
               )
   
           # Helper table for database migration using minimal schema.
           table = sa.Table(
               foreign,
               sa.MetaData(),
               sa.Column("id", sa.Integer, primary_key=True),
               sa.Column("datasource_name", sa.String(255)),
               sa.Column("datasource_id", sa.Integer),
           )
   
           # Migrate the existing data.
           for datasource in bind.execute(datasources.select()):
               bind.execute(
                   table.update()
                   .where(table.c.datasource_name == datasource.datasource_name)
                   .values(datasource_id=datasource.id)
               )
   
           with op.batch_alter_table(foreign, naming_convention=conv) as 
batch_op:
               # Drop the datasource_name column and associated constraints.
               names = generic_find_fk_constraint_names(
                   foreign, {"datasource_name"}, "datasources", insp
               )
               for name in names:
                   batch_op.drop_constraint(
                       name or f"fk_{foreign}_datasource_name_datasources",
                       type_="foreignkey",
                   )
   
               batch_op.drop_column("datasource_name")
   
       try:
           # Drop the old more restrictive uniqueness constraint.
           with op.batch_alter_table("datasources", naming_convention=conv) as 
batch_op:
               batch_op.drop_constraint(
                   generic_find_uq_constraint_name(
                       "datasources", {"datasource_name"}, insp
                   )
                   or "uq_datasources_datasource_name",
                   type_="unique",
               )
       except Exception as ex:
           logging.warning(
               "Constraint drop failed, you may want to do this "
               "manually on your database. For context, this is a known "
               "issue around nondeterministic constraint names on Postgres "
               "and perhaps more databases through SQLAlchemy."
           )
           logging.exception(ex)
   ```
   
   Additionally, when loading example datasets such as the "birth_names" 
dataset, ensure that the table name is unique within the context of the 
database. The `load_data` function in the `superset/examples/birth_names.py` 
script handles the loading of data into the specified table name, ensuring it 
is unique within the database schema:
   
   ```python
   def load_data(tbl_name: str, database: Database, sample: bool = False) -> 
None:
       url = get_example_url("birth_names2.json.gz")
       pdf = pd.read_json(url, compression="gzip")
       if database.backend == "presto":
           pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
           pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M:%S")
       else:
           pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
       pdf = pdf.head(100) if sample else pdf
   
       with database.get_sqla_engine() as engine:
           schema = inspect(engine).default_schema_name
   
           pdf.to_sql(
               tbl_name,
               engine,
               schema=schema,
               if_exists="replace",
               chunksize=500,
               dtype={
                   "ds": DateTime if database.backend != "presto" else 
String(255),
                   "gender": String(16),
                   "state": String(10),
                   "name": String(255),
               },
               method="multi",
               index=False,
           )
       print("Done loading table!")
       print("-" * 80)
   ```
   
   This script ensures that the table name `tbl_name` is unique within the 
specified database schema, thus avoiding conflicts with other tables 
<sup>[[1]](https://github.com/apache/superset/blob/master/superset/migrations/versions/2017-10-03_14-37_4736ec66ce19_.py)</sup><sup>[[2]](https://github.com/apache/superset/blob/master/superset/migrations/versions/2015-09-21_17-30_4e6a06bad7a8_init.py)</sup><sup>[[3]](https://github.com/apache/superset/blob/master/superset/examples/birth_names.py)</sup>.
   
   
   <!-- Dosu Comment Footer -->
   *To continue talking to [Dosu](https://dosu.dev), mention @dosu.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Re: [I] Importing example dataset failes because of database constraint [superset]

Reply via email to